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The survey says . 



WHEN YOU START a new magazine, you guess. Sure, 
you base your theory on some data from potential readers, 
but you stiL] gamble and hope for the best. That's in large part 
what we did when we launched Tiie AmigaWorld Tech 
journal... we hoped for the host. Were we on the mark? You 
delivered the answer in the results of our recent subscriber 
survey. Oh, we're not perfect yet. 17% say we're not technical 
enough (of course, 15% say we're too technical), but that 60% 
of you that said we're "just right" made my day! Thanks. 

So, who reads The AmigaWorld Tech journal? 11% are begin- 
ners, 40% are intermediates, 32% are advanced, and 17% are 
developers. 51% of you own Amiga 2000s or 2500s, 20% 
powered up to A3000s, 23% have A500s, and a loyal 29% still 
use AlOOOs. Not surprisingly, 42% of Tech journal readers use 
their Amigas primarily for software development. Multisync 
monitors, CD-ROM drives, accelerators, and programming 
tools top your shopping lists. 

As for languages, C is the favorite, with SAS outscoring 
Manx as the compiler of choice. 78% of you want hea\ r y C 
coverage, and 97% want at least moderate coverage. Assem- 
bly follows: 32% called for heavy coverage, 76% for moderate 
or more. ARexx scores third (75% moderate to heavy), BASIC 
comes in low: 60% think it deserves only light attention. 

The majority of you (84%) prefer to have the code exam- 
ples on disk. One respondent went so far as to declare this 
the "best part of the journal." "Let most of the on-disk code 
reflect professional-level programming. We amateurs want to 
know how to do it right," another reader requested. The Peer 
Review Board is dedicated to ensuring that this is the case. 

A 64% majority wants us to go monthly. (Yipes.) 

Reviews are a hot topic of debate. "With only 48 pages to 
fill [now 64], reviews are better kept elsewhere," one reader 
recommended. Most of you seconded this morion, indicating 
if we have a choice between printing reviews or one more 
tutorial article with code, we should run the tutorial. (I agree, 
motion carried.) When we do print reviews, however, most 
of you prefer in-depth examinations of programming tools, 
followed by performance tests of hardware. 

As to articles, specific techniques, OS coverage, and algo- 
rithms are the most requested types. More hardware cover- 
age was a favorite comment. One respondent proposed a 
series of articles that describe the Amiga's various proces- 
sors — basic chip set, enhanced chip set, named chips, 68000, 
68020, '030, and 68882. As you read this, the project is already 
in the works. To the reader who called for an explanation of 
IFF85, we'll go you one better: We're teaming up with CATS 
for an article on the latest version of the venerable standard. 
And no, we won't forget you beginners. In each issue we try 
to include at least one introductory piece and provide back- 
ground sidebars on the weightier subjects. 

One thing we'll always do is listen to your comments and 
suggestions. Don't wait for another survey to let us know 
what you think. ■ 
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68030 to 68040 Differences 



What's new to Motorola's latest 
generation of chip? 

By John Meek and Tim Reese 



A HIGH PERFORMANCE 32-bit microprocessor in Mo- 
torola's 68000 family, the 68040 maintains 100% user-code 
compatibility with previous members of the family. The 
'040's Integer Unit, Floating-Point Unit, and Memory Sub- 
system operate in parallel to achieve significantly higher per- 
formance than that of its predecessor. Motorola claims four 
times the performance of a 68020 microprocessor at the same 
clock speed. In practice, we have seen performance of about 
two to four times that of a 68030 (at the same clock rate). 
What is this 68040 and what makes it so much faster than the 
'020 and '030? To answer this question, we will examine some 
of the major philosophical and architectural differences be- 
tween the '040 and the '030. (Consult the MC68040 32-Bit Mi- 
croprocessor User's Manual and the MC68040 Designer's Hand- 
book for complete details regarding issues not discussed here.) 

OPTIMIZATION AND SPEED ENHANCEMENTS 

What kind of changes are necessary to double or quadru- 
ple the speed of a family of processors? Motorola's approach 
was to collect a rather formidable amount of data based on 
existing 68020 and 68030 systems. Millions of cydes of bus ac- 
tivity were recorded on various systems running different ap- 
plications and different operating systems. This trace data 
was then analyzed using a high-level statistical performance 
model along with cache and MMU simulators. 

Motorola's engineers based their architectural decisions on 
the '040 performance goals, the collected trace data, customer 
input, case of implementation, and, of course, silicon usage. 
The instructions with the highest dynamic execution fre- 
quency were optimized. To maximize the execution speed of 
these newly optimized instructions, they pipelined the Inte- 
ger Unit. The most common instructions execute in a single 
clock cycle. The cycle time of the ALU is an important factor 
in optimizing instructions, To achieve high performance, it is 
very desirable for the peak instruction rate to be equal to that 
of the ALU cycle rate. With this in mind, the '040 is designed 
so that the ALU cycle rate is matched to the cache access 
time. The Floating-Point Unit was also optimized and 
pipelined. Of more interest, however, is the fact that the Float- 
ing-Point Unit now resides on-chip. This arrangement great- 
ly reduces the amount of time spent in external arbitration to 
a coprocessor. 

Motorola's analysis also indicated that the cache architec- 
ture should be based on the Harvard model (separate ad- 
dress and data space in memory), as in the '030. The design- 
ers, however, changed the caching arrangement from directly 
mapped (as on the '030) to a set-associative type. The asso- 
ciative cache scheme provides a much higher hit rate than the 
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direct mapped type (see "Cache Mapping Techniques"). In 
addition, the copy-back style of data caching was incorpo- 
rated into the new design. It has been observed that, de- 
pending on the program flow, copy-back caching may in- 
crease performance by as much as 25% over conventional 
write-through cache schemes. 

I'maltv, you should understand that the architecture is in- 
tended to provide most of the memory bandwidth from the 
internal caches, meaning accesses to external memory are op- 
timized for the loading and unloading of the internal cache 
systems. The bus protocol has been simplified to efficiently 
handle cache line bursts and to minimize the latency from a 
cache miss. The simple synchronized start-terminate sequence 
also simplifies the hardware interface to external devices. 

Arguably, the biggest difference between the '030 and the 
'040 is the bus structure. This includes termination, dynam- 
ic bus sizing, transfer attributes versus function codes, burst 
retry, and the output buffers. 

DATA BUS PROTOCOL 

The data bus protocol has been completely changed from 
the previous processors. To optimize the pipelines, the '040 
was designed to be a fully synchronous processor, whereas 
the MC68030 could be used either as a synchronous or an 
asynchronous processor. In the past, the 68K family has used 
AS* and DS* to indicate that the information on the bus is 
valid upon assertion. The '040 deviates from this type of data 
bus protocol by introducing a signal called Transfer Start: 
TS*. Transfer Start, alone, does not indicate that the infor- 
mation on the address and data bus is valid, but rather that 
the information is valid on the next rising edge of BCLK. This 
will help hardware designers as the clock speeds increase for 
the '040, and it allows you to maximize the cycles so that ex- 
tra cycles are not taken up for such things as decoding and 
control when the clock cycles get faster. 

The termination of the cycle is different as well. The 
MC68030 provides two distinct ways to terminate a cycle. The 
bus cycle termination signals DSACKx* and STERM* both 
provide asynchronous or synchronous termination. For ter- 
minating a cycle with the '040, Motorola eliminated the 
DSACKx and STERM signals and introduced a new signal 
called Transfer Acknowledge (TA*). The '040 uses only the 
TA* signal to terminate a cycle. Like the TS* signal, the TA* 
signal is synchronous. It is only valid with the rising edge of 
the BCLK. For systems that absolutely need to operate the 
data bus asynchronously, Motorola provides the data latch 
enable mode. When you select this mode at reset time all 
data will be read by the '040 processor with the Data Latch 
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Enable (DLE*) signal, thus pro- 
viding for an asynchronous 
read. The cycle, however, is still 
terminated by TA*. 

The goal of designing any 
adapter or accelerator board is 
to match the host processor 
functions with the accelerated 
processor functions. Asyn- 
chronous termination can be a 
problem if you do not take spe- 
cial care. If the processor is run- 
ning in both a synchronous and 

an asynchronous environment, Figure l: An illustration ol the 68O40's dale transfer paths 
then the data latch enable mode 

may not be appropriate. In this case, the asynchronous ter- 
mination must be synchronized or it could create metastabil- 
ity problems. At the higher speeds, the problem gets worse. 
Conversely, if signals are synchronized that don't need to be, 
then cycles are wasted on the front and back end of the syn- 
chronization process. Suffice to say that this takes some 
thought in an adapter design. 



DYNAMIC BUS SIZING 

Unlike the MC68020 and MC68030, the MC68040 no longer 
supports dynamic bus sizing. To understand how this affects 
'040 design, you must know what dynamic bus sizing is (see 
"Dynamic Bus-Sizing Details" ). 

The '040 has no dynamic bus-sizing capability. If the pro- 
cessor requires a byte down a particular byte lane, then the 
byte has to be returned to the processor via the correct byte 
lane. This forces byte devices to either multiplex their byte 
data onto the correct byte lane or to access through software 
the exact byte lane by control of the address. Probably, there 
is more hardware and control associated with matching an 
'040 bus structure to a fellow member 68K processor than any 
other aspect of adapter design. 

Another interesting aspect to dynamic bus sizing is an '030 
processor will take care of completing four or two accesses, 
respectively, to a device if the processor makes a LONG- 
WORD or WORD request to a byte port. This makes access- 
ing a device very easy for the software, because all of the 
work is done by a device such as the '030. It actually can 
eliminate the need for the software to even know what size 
the data port is on an external device. For the '040, because 
the processor does not know about such things as 8- or 16- 
bit ports, the processor reads as much data as requested upon 
termination of the cycle. Therefore, it can be disasterous to re- 
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quest a LONGWORD of information for a byte port because 
only one byte of data will be correct. 

In adapter design, the challenge is to make the '040 oper- 
ate exactly like the 68030 for dynamic bus operation. This 
means external control initiates and terminates cycles inde- 
pendent of the 68040 so that all of the data is present and is 
on the correct byte lanes. 

TRANSFER ATTRIBUTES VS. FUNCTION CODES 

For the 68040, function codes are eliminated. Previously, 
the 68K family used function codes to indicate the address 
space type for a bus cycle: supervisor, user or CPU, and pro- 
gram versus code space. Some designs have used these bits 
and others have completely neglected their use in the system 
design. The 68040 replaces the function-code bits with the 
transfer-modifier and transfer-type bits. The three transfer- 
type bits are essentially the same as the old function-code bits 
with some additional encoding where the function-code bits i 
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were reserved. The encoding now shows Data Cache Push 
Accesses and MMU Table Search Accesses, but no longer 
indicates CPU address space. In another change, the transfer- 
modifier bits indicate the interrupt level that is being ac- 
knowledged during an interrupt-acknowledge cycle. Mo- 
torola has also introduced two new signals — the transfer-type 
bits — that indicate which type of cycle is taking place on the 
'040 bus. This information is necessary for multiprocessing 
systems to maintain cache coherency. 

BURST CYCLES 

A burst is a special cycle that takes advantage of RAM 
technology to obtain data more quickly. If a memory cycle 
takes four clock cycles, normally, a line access (four LONG- 
WORDS) takes 16 clock cycles (4+4+4+4). If burst technolo- 
gy is used, however, the line access takes 10 cycles (4+2+2+2) 
or even 7 cycles (4+1+1+1). There are only a few differences 
between the burst cycles of the 68040 and the 68030. How- 
ever, these differences are critical in any adapter or accelera- 
tor design. The 68030 can handle a wrap of address with a 
burst cycle. In other words, a burst need not be on a 16-byte 
boundary (A2/A3*0). And of course, the external device 
needs to wrap the address because A2 and A3 are static. The 
68040 requires that a line access be aligned to a 16-byte 
boundary (A2/ A3=0). This really isn't much of a problem in 
design as long as the external devices know how to respond 
to a burst with the 68040. 

A subtle difference that deserves mentioning is the way 
retry is done on burst between the 68040 and the 68030. The 
68030 can retry on any access during a burst. The 68040 can 
only retry on the first access on a burst. This can cause a 
great deal of trouble in designing an '040 adapter for a sys- 



tem that allows retry during any cycle in a burst. 

The way the burst can be inhibited is altered, as well. The 
68030 can end the burst on any access if the CIIN becomes ac- 
tive. This is not the case with the 68040. The 68040 only looks 
at TCI on the first access and doesn't affect the burst. A new 
signal called TBI can become active on the first cycle and in- 
hibit the intended burst. Note, after the first cycle in a line ac- 
cess, TCI and TBI are both ignored by the 68040 processor. 

OUTPUT BUFFERS 

The MC68040 has a flexible output buffer scheme com- 
pared to that of the '030. The '040 has two different output 
buffers that are selected at reset. The large buffers contain 
55mA drivers that are designed to drive 50-ohm terminated 
transmission lines with a minimum amount of delay. The 
small buffers contain 5mA drivers designed to drive unter- 
minated lines. The small buffers have more delay than the 
large buffers but dissipate significantly less power. Normal- 
ly, the '040 can dissipate up to 3 watts, but with all of the large 
buffers enabled it can dissipate 5 watts of power. This could 
be a real problem in some designs. Therefore, Motorola has 
given the designer the flexibility of choosing the output 
buffers in three groups to optimize power versus speed. A de- 
signer may need to have large buffers enabled on a portion 
of the outputs, but may want the others to be small buffers 
for power dissipation reasons. 

ARBITRATION 

Unlike current members of the 68000 family of processors, 
the 68040 provides no on-chip arbitration for the external 
bus. The '040 was designed to be a slave with an external ar- 
biter controlling all of the bus arbitration. This scheme pro- 
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Two terms must defined for us to 
discuss dynamic bus sizing. Mo- 
torola extensively uses the term port 
when talking about dynamic bus 
sizing. A port is nothing more than 
a device with a set data-bus size. For 
example, an eight-bit peripheral 
would be considered an eight-bit 
port, a 16-bit peripheral or any ex- 
ternal device that has 16 bits is con- 
sidered a 16-bit port, and so forth. 
The second term is byte lane. A 32-bit 
bus, of course, has four byte lanes. 
Each lane is distinct (31:24, 23:16, 
15:8, 7:0), and an address is associat- 
ed with each byte (see Figure 2). 
Therefore, the processor, depending 
on the instruction, expects data on 
particular byte lanes. 

Dynamic bus sizing requires that 
the byte or word port be fixed on set 
byte lanes. This actually is helpful for 
a hardware designer because the pro- 
cessor can request bytes from any of 
the byte lanes (as per the address be- 
ing requested by the current instruc- 
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Figure 2: The four byle lanes of a 32-bit bus. 

Hon), and the designer can give the rect byte lane for the instruction be- 

bytes on the set byte lanes to which ing processed. This, of course, is ad- 

the device is attached. Through dy- vantageous to a device that can only 

namic bus sizing, the processor then control a single byte lane, 
dynamically puts the byte on the cor- — }M and TR 



4 November/December 1991 



68030 to 63040 Differences 



vides more flexibility in the design of board-level arbitration. 
The arbiter may be set up to mimic 68000-style processors (in 
which the 68040 is the de facto bus master), or it may treat 
the 68040 as one of many potential bus masters. This has giv- 
en us a lot of flexibility in our design to do exactly what we 
want to do in terms of priority and speed with our arbitra- 
tion system. The '040 now has to request the bus (BR*) any 
time that any external transfer takes place. The external ar- 
biter then grants the bus to the '040 (BG*), and the '040 re- 
sponds with (BB*). The largest implication of moving the ar- 
biter from internal to external is that now you can have 
multiple '040s on the same bus. With the new scheme and be- 
cause bus arbitration is synchronous, it is possible to switch 
processors in a single clock cycle. This al- 
lows for a convenient strategy in the fu- 
ture for multiprocessor systems. 



MEMORY SUBSYSTEM 

The 68040 memory subsystem must 
provide instructions and data to the In- 
teger and Floating-Point Units at a suffi- 
cient rate to match their performance. 
Keeping these units fed from a single ex- 
ternal bus is not possible, so the memo- 
ry unit is designed to provide the major- 
ity of the required bandwidth from the 
internal caches. The external bus is opti- 
mized to handle cache loading and un- 
loading. To this end, the system is Har- 
vard architecture (as in the 68030). 

Beyond the major architecture, the 
'030 and '040 subsystems diverge. The 
largo difference is mainly attributed to the performance in- 
crease required of the '040. The '040 has two separate mem- 
ory management units (MMUs) and two 4K caches (one 
each for instruction and data). The '030 also has two sepa- 
rate (smaller) cache units, but only a single MV1U. In the 
'040, each MMU provides a separate address translation 
cache (ATC) of 64 entries and a pair of transparent transla- 
tion registers (TTR) that operate in parallel with the caches. 
The MMUs supply full (32-bit) logical-to-physical address 
translation providing for complete memory management in 
a virtual, demand-paged environment. The caches in the 
'030 are addressed logically as opposed to the '040's phys- 
ically addressed caches. The reasoning behind this is ex- 
plained in the Caches section below. Each page (4K or 8K) 
may independently specify cache modes, write protection, 
and supervisor/user protection. TheTTRs may specify the 
same attributes without translating (transparent transla- 
tion) addresses for memory segments on 16 megabyte 
boundaries. 

The large minimum page size (relative to the '030) is a re- 
sult of the fact that ATC and physical cache lookups can oc- 
cur at the same time. This means, of course, that the same 
bits cannot be used for both lookups. The reduction in the 
available bits increases the minimum page size. Some sim- 
ple "four-way set-associative 4K byte cache" math says: 
, — , Minimum page size=2**10=lK byte (the proof is left to 
you...). Looking to the future, Motorola elected to increase 
this size in anticipation of larger caches. The debate over 
minimum page size, however, rages on. Most existing sys- 
tems use a 4K page size, while the newer ones use 8K. Larg- 
er page sizes reduce ATC faults and increase paging effi- 
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ciency to disks. Unfortunately, they also increase internal 
fragmentation (thereby boosting the odds that unused data 
is unnecessarily paged to and from disk). 

Fortunately, the user is relieved of the responsibility of 
dealing with any of these MMU-related issues (one of the 
reasons why user code remained 100% compatible with the 
68030). Supervisor-only instructions are provided to main- 
tain the TTRs and ATCs. Separate ATCs may be maintained 
for Supervisor and User spaces. The ATC-loading algorithm 
(a hardware tablewalk) uses separate User and Supervisor 
root pointers as entry points into the three-level table struc- 
ture of descriptors. For additional information regarding 
this table structure, refer to the MC68040 32-Bit Micropro- 
cessor User's Manual. 

There are a number of other Supervi- 
sor-level differences between the mem- 
ory management philosophies of the 
two chips. As these issues are not of 
general interest, they will not be dis- 
cussed here. 



(one for instructions 
and one for data)." 



CACHES 

The cache system, while retaining the 
same major architecture as the '030, has 
some very significant differences and 
enhancements. A separate bus controller 
loads and unloads caches by burst trans- 
fer. The use of a separate bus controller, 
along with the MMU and cache con- 
trollers, allows the internal caches to 
provide the majority of the required 
memory bandwidth (cache lookups and 
external cycles may occur simultaneously). 

Cache size grew from 256 bytes in the MC68030 to 4096 
bytes in the MC68040. The MC68040 retained the 68030's 
Harvard-style architecture, providing separate 4K caches for 
instructions and data. In the '040, each cache is organized as 
four-way set-associative with 64 sets of four lines each (for ex- 
planation, see "Cache Mapping Techniques"). Each line con- 
tains 16 bytes of data or instructions, an address tag field, and 
state information. This is contrasted with the direct-mapped 
cache organization of the '030. Motorola engineers chose a 
four-way set-associative organization to minimize silicon 
area and maximize hit rates (a fully associative cache would 
actually provide greater hit rates, but the required silicon real 
estate was prohibitive). Motorola also studied other organi- 
zations (direct-mapped, two-way set-associative), but decid- 
ed on four-way set-associative for its superior performance 
in a limited space. 

As in the '030, the instruction cache is limited to two modes 
of operation: cacheable and noncacheable. The data cache, 
however, has grown to four modes of operation: cacheable 
copy-back, cacheable write-through, noncacheable nonseri- 
alized, and noncacheable serialized. Copy-back caching pro- 
duces maximum performance and minimum bus utilization 
in single-processor implementations. For multiple proces- 
sors, the proper use of copy-back requires complicated bus 
and cache state protocols to maintain cache coherency. This 
is one of the reasons that write-through caching was also 
supported on the MC68040. 

Both noncacheable modes operate such that the data cache 
is never used or loaded. Any data operation directly access- 
es external memory. Noncacheable serialized mode simply 
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guarantees that all reads and writes will occur in program se- 
quence. This mode may sometimes be necessary due to the 
pipeline nature of the Integer Unit (IU). The IU may actual- 
ly reorder accesses to maximize performance. 

Operation of the cacheable modes is a bit different than in 
the '030. For both cacheable modes, a read miss causes a line 
to be loaded into the cache. Writes are handled differently, 
depending on the selected mode for that page. A write to a 
write-through page will always update the external memo- 
ry and will update the cache if the operand is resident. This 
is the same type of data caching supported by the MC68030. 
An important difference between the '030 and '040 write- 
through cache involves cache-line allocation. The '040 oper- 
ates under a no-write-allocate policy (writes that miss in the 
cache are written to external memory, but the corresponding 
line is not loaded into the cache). Write allocation in the '030 
is programmable via a bit in the Cache Control Register. 

In copy-back mode, a write miss to a copy-back page will 
allocate a cache entry for loading, then load it from exter- 
nal memory. The write data is then written to the cache 
only. Because the operand value in external memory no 
longer matches the value in the cache, the entry is consid- 



ered dirty. Dirty entries are written to external memory in 
the event of cache flush or line replacement (the cache is full, 
and a new line must be loaded). If the cache is full, a pseu- 
do-random algorithm designates the cache line to be re- 
placed. This line is then moved to an internal temporary 
buffer and tagged for update (it is placed in the write queue 
and will be written to the external memory in its turn). In 
the event of a write hit to a copy-back page, the cache line 
is updated, and the dirty bits are set for the appropriate 
LONGWORDS within the line. Copy-back data caching is 
not available on the MC68030. 

Cache-line loading is part of the newly optimized memo- 
ry subsystem in the '040. When a new cache line is required, 
the cache controller requests a line read from the bus con- 
troller. The bus controller requests a burst-read transfer by in- 
dicating a line access on the size signals (the '030 asserts the 
signal CBREQ). The responding device may indicate that 
burst transfers are not supported by asserting the Transfer 
Burst Inhibit (TBI) signal. Similarly, an '030 system may in- 
hibit a burst transfer by failing to supply CBACK (Cache 
Burst ACKnowledge) with its standard termination. Bursting 
on the '030 may also be controlled by setting the burst-enable 



Cache Mapping Techniques 



One of the key aspects of caching is 
the method by which main memory 
is mapped to the cache. The method 
is very important because the cache is 
typically much smaller than main 
memory. The result, of course, is that 
several lines in main memory may 
map to a single cache line. The objec- 
tive is to maximize cache hits and to 
minimize thrashing (repeated cache 
misses resulting from consecutive 
memory accesses mapping to the 
same cache line). The cache needs to 
keep an inventory of its contents so 
that it can react the next time the pro- 
cessor requests data. The high order 
portion of the requested address (the 
tag) is compared to this list of infor- 
mation held in the cache. To indicate 
a match, logic compares each tag to 
the appropriate bits from the re- 
quested address. The amount of com- 
parison logic depends on the cache's 
mapping technique. The most fre- 
quently used mapping techniques 
(also known as placement policies) 
are direct, fully associative, and set 
associative. 

Direct mapping is the simplest of the 
placement policies. Any given line 
from main memory is placed in the 
cache at the same line modulo N 
(where N is the number of lines in 
the cache). The real address is broken 
into three fields: the tag field, the line 



field, and the byte field. The tag field 
checks whether the addressed entry 
in the cache contains the requested 
line. The line field is used to access an 
entry in the cache. The byte field is 
used to address bytes within a line. 
Replacing lines in the cache (replace- 
ment policy) is trivial. Because a par- 
ticular line in memory maps to a par- 
ticular cache line, there is no choice as 
to which cache line is replaced. 

The fully associative mapping tech- 
nique represents the opposite ex- 
treme from direct mapping. Any line 
from main memory may be mapped 
to any entry in the cache. A large tag 
field is required (enough to address 
each line of the cache), because it is 
possible to map to any line in the 
cache. The associative comparison of 
the tags is a time-consuming process 
since it extends over the entire 
length of the cache. Because data 
may be placed anywhere in the 
cache, additional logic is required to 
determine where the new data will 
be placed. This determination is 
complicated by a full cache (which 
line should be replaced?). Fully as- 
sociative caches normally use a re- 
placement policy that saves recently 
used data and instructions. This 
mapping method produces very 
high hit ratios and eliminates thrash- 
ing, but at the expense of complicat- 



ed and costly hardware. 

A good compromise between the 
two extreme cases is called the set-as- 
sociative mapping technique. This 
method allows main memory lines to 
be mapped to a limited number of 
cache entries. Each alternate cache 
mapping for a given location in mem- 
ory is termed a way. Therefore, a 
two-way set-associative cache allows 
any location in memory to map to 
two cache entries. Similarly, a four- 
way set-associative cache allows four 
entries for any given memory loca- 
tion. Generally speaking, the perfor- 
mance increase from a set-associative 
cache with more than four ways is 
overshadowed by the complexity of 
its implementation. The advantage 
over full associativity is readily seen. 
A portion of the address is now used 
to select a set (the number of sets is 
the number of lines in a cache divid- 
ed by ways). The tags are then asso- 
ciatively compared for the number of 
ways rather than for the number of 
lines in the cache, greatly reducing 
the number of associative compares 
that are necessary. This, of course, 
makes for a simpler hardware imple- 
mentation and a faster cache-lookup. 
While the hit ratio is not as high as for 
the fully associative method, it is but- 
ter than for the direct mapped case. 
— JM and TR 
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bits in the Cache Control Register. The '040 has no such fea- 
ture for the disable of burst mode. 

Another significant difference between the '040 arid '030 in- 
volves the cache addressing method. The MC68040 caches are 
physically addressable rather than logically addressable (as 
in the MC68030). Physical addressing provides some distinct 
advantages over logical. It is not necessary for the operating 
system to flush the caches on a task switch. Physical ad- 
dressing also allows external access to the caches (bus snoop- 
ing) without the addition of reverse physical-to-logical ad- 
dress translation. Physical addressing into the caches with a 
4K minimum page size allows cache lookup to run concur- 
rently with address translation. Cache access concurrent with 
address translation guarantees a single-cycle cache lookup. 

FLOATING-POINT IMPLEMENTATION 

The Floating-Point Unit is now on the same chip as the core 
processor. It has been optimized and can concurrently work 
with the Integer Unit. Some of the floating-point instructions 
now show a ten times improvement in speed. Not all of the 
floating-point instructions, however, were implemented on- 
chip. Certain instructions are emulated by the '040 and use the 
floating point instructions on-chip. The actual implementation 
details are beyond the scope of this article. 

While design philosophies and processor architecture have 
gone through a complete renaissance, the processor remains 
approachable and compatible. While we have covered the ma- 
jor differences only, we hope to have motivated you to delve 
deeper into the architecture of the Motorola MC68040. ■ 

John Meek urns involved in the design of mainframe computers 
at Amdahl and in the design of the HEP II supercomputer. He has 
worked in Europe as a research engineer and is currently the di- 
rector of engineering at Progressive Peripherals & Software. Tim 
Reese has developed a number of accelerated graphics systems for 
companies in the U.S. and in Europe and was involved in design- 
ing Sun-compatible workstations at Solbourne Computers. He is 
currently the lead hardware engineer at PP&S. Contact them c/o 
The AmigaWorld Tech Journal, 80 Elm St., Peterborough, NH 
03458, or on BIX (r.brothwell). 
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ON DISK i-i-i i-t-i. 

Tag Tips 



By John Toebes 



WITH THE 2.0 operating system, Commodore recognized 
that they had to extend a number of existing data structures 
to support new features. In doing so, they encountered a 
number of difficulties: 

• There was no room in the existing structures. 

• Defining default values became complicated. 

• Adding all the features and extra expansion could waste 
a lot of space. 

• They didn't want to run into this problem again when 
they added more features. 

The first problem was the hardest to resolve. There was no 
room in such structures as Intuition's NewWindow for the 
likes of 3-D-Iook colors, public screen information, and zoom 
gadget requests. It meant that existing programs could not re- 
quest any of these new fancy features unless they were will- 
ing to break compatibility — not considered to be a good idea. 

There are many tricks to make structures upward com- 
patible by stealing bits or putting illegal values in certain 
fields so the program knows it has new-style structure, but 
you can go only so far. The process soon becomes too com- 
plex to understand and distinguish. By attempting to leave a 
lot of room in a structure for expansion, you have to rely on 
determining appropriate default values that will be accept- 
able in the future. This is not only wasteful of space, but re- 
quires being absolutely certain of what the field may be used 
for to determine an unused value to stick in it. 

To combat this problem, Commodore adopted the concept 
of tags. To understand how they work and solve the problem, 
let's look at a structure conceptually. For each field in a struc- 
ture, you have a name that identifies the field and a value that 
you can extract from it. Simply put, structure->member will 
have a given value and the compiler must make the match- 
up between the name you use and the value that it extracts. 
Because the compiler has assigned offsets (and eventually 
storage) within the structure for each field, the slot must al- 
ways contain a value. If you want the field to be optional, you 
must either come up with a null value to indicate so or use 
some bits elsewhere to indicate that it should not be used. 

THE STRUCTURE OF TAGS 

With tags, this mapping from a name to the (potential) val- 
ue is performed not by the compiler at compile time, but by 
a series of run-time routines. The concept of using a name to 
identify the field still applies, but instead of the name indi- 
cating a storage location, it is used to scan a table for any oc- 
currence of a match. Think of the actual implementation of 
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this table as a array of name and value pairs. For simplicity, 
all values are stored as 32-bit values. This rums out to be con- 
venient on the Amiga, as almost everything can be repre- 
sented in 32 bits (the only exceptions are double-precison 
floating-point values). You will find the following definition 
for a Tagltem in urility/tagitem.h: 

struct Tagltem ( 

Tag ti Tag; f a ULONG type as we will see later V 

ULONG tl_Data; 

}; 



A typical tag list looks like: 



TAG 


Data 




TOP 


10 




WIDTH 


100 




HEIGHT 


50 




LEFT 


10 




TAG.DONE 


<Don't Care> 



The tag list above takes up 40 bytes in memory. The first 
thing to understand is that there is no particular order im- 
plied about the entries (except for TAG^DONE), The tag list 
consisting of: 



TAG 


Data 




LEFT 


10 




WIDTH 100 




HEIGHT 


SO 




TOP 


10 




TAG_DONE 


<Don't Care> 



is identical to the first list for all intents and purposes. To in- 
terpret a tag list and extract the data from it, you must search 
from the start of the list until you either encounter a TAG_ 
DONE or the item you desire. What we haven't considered 
is how this magic tag is represented. Under X- Windows, the 
TAG is actually a pointer to a string that contains the text 
LEFT, WIDTH, or whatever the tag item is. While this allows 
for a lot of flexibility, it takes quite a bit of optimization to 
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make it efficient to access the tag lists. Instead, Commodore 
uses defined numeric values for each of the tag values. A 
value of or TAG_DONE indicates the end of the list. (We 
will examine a few other special values soon.) 

The tags of LEFT, WIDTH, HEIGHT, and TOP are what are 
called user tags. They always have the high-order bit set 
(0x8000000) to distinguish them from system tags. To assist in 
ensuring that they can be identified as user tags, you should 
always define them relative to the system-defined 
TAGJJSER, which is set to 0x8000000. The file 
utility/tagitem.h also defines a data-type tag that is a 
ULONG and is the appropriate type with which to declare 
any tags. Typical C code defines tags as follows: 

tdefine LEFT TAG USER+1 
SdefineTOP TAGJJSER+2 
#define HEIGHT TAG_USER+3 
#defirte WIDTH TAG_USER+4 

So, when you want to find the WIDTH from the tag list, you 
search for an entry that has the value 0x8000004 in the TAG 
field. Then you use the corresponding DATA field. This is ac- 
tually quite simple and can even be accomplished in C as: 

ULONG FindTagfTag Tagval, ULONG defval, struct Tagltem *taglist) 

{ 

while(taglist->ti_Tag != TAG_DONE) 

{ 

if (taglist->tl_Tag == Tagval) 

return(tagllst->tLData); 
taglist++: /" advance to the next tag '/ 

} 

return(defval); f We didn't find it, return the default value V 

} 

In utility .library, Commodore provides a more advanced 
version of this routine called GetTagDataQ. It takes the same 
parameters, but it also understands the other system tags we 
will see later. 

A couple of other useful routines you will find in utili- 
ty. library are: 

Tagltem = FindTagItem(Tag, Taglist);: Locates the Tag- 
Item entry in a Taglist that corresponds to the given tag. If it 
can't find the Tag in the list, it returns NULL. 

flag = TaglnArrayfTag, TagList);: Returns TRUE if the tag 
exists in the TagList and FALSE otherwise. It is very useful 
for determining if a particular feature is being requested. 

As you can see, even though the Tagltem data is a ULONG, 



you can store BOOLs, shorts, and even pointers in the entries. 
You can even have entries where the mere presence in the 
array is enough to trigger a feature. 

SYSTEM TAG VALUES 

In addition to the TAG_DONE system tag, there are three 
other system tags: 

TAG_IGNORE: This is simply a place holder to tell the 
tag-list processing to ignore this entry and the data that goes 
with it. This is useful for hiding an entry in the array. 

TAG_MORE: The data for this entry is a pointer to another 
tag list that is considered to be a continuation of the current 
tag list. Note that an}' entries after TAG_MORE in the cur- 
rent list are ignored. You can think of this system tag as a 
form of goto. 

TAG_SKIP: The data for this entry is the number of Tag- 
Item entries following the current entry that are to be ig- 
nored. A TAG_SKIP is the same as a TAGJGNORE. Think 
of this as a jump-ahead-a-few-entries selection. 

A little creative work can build tag lists like: 



TAG_DONE 



Don't Care 



TAGJGNORE 


Don't Care 


TAG DONE 


Don't Care 



TAG MORE 



Pointer 



TAG DONE Don't Care 



TAG_SKIP 





TAG DONE 


Don't Care 



TAG_SKIP 


1 


Don't Care 


Don't Care 


TAG_DONE 


Don't Care 



All of these are equivalent examples of empty tag lists. The 
first is just one entry that says it is the last entry. The second 
list has a single entry, but it is a TAGJGNORE that the tag- 
list rou fines, of course, ignore. The third one is an example 
of a TAG_MORE that points to another tag list that happens 
to be an empty list. (Never point a TAG_MORE back at itself. 
You might get fired of waiting for the tag-list routines to nev- 
er complete the traversal.) Example four uses TAG_ 
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SKIP to ignore the current entry. The difference between this 
and the TAGJGNORE is that the data Held for the TAG_ 
SKIP must be and the TAGJGNORE data field can be set 
to anything. The last case shows how to skip over a subse- 
quent entry without having to rewrite the entire list. 

TWO TYPES OF CALLS 

With a clear idea of how tags sit in memory, you are ready to 
learn how to use (hem with the system routines. There are two 
ways to call any of the system routines that take tag lists as pa- 
rameters. You can specify the tag-list entries as parameters to the 
function and call it through a stub routine in amiga.lib, which 
turns the parameters into a tag list for you. The other method is 
to build a tag list ahead of rime and then pass the address of the 
tag list as a parameter to the routine. For example: 

struct Tagltem tagairayfl] = { { TAG_DONE, } }; 
OpenWlndowTagList(nw, tagarray); 

and 

OpenWindowTags(nw, TAG DONE, 0); 

are functionally identical. The choice of usage is up to you, 
but you should be aware of some important problems: 

• When using the tags as individual parameters, you are go- 
ing through a stub routine that may prevent you from mak- 
ing the code resident. 

• Pushing a lot of parameters on the stack at run time is ex- 
pensive and slow. 

• Some versions of the commercial compilers have prob- 
lems with such constructs as: 

struct Tagltem tagarray[1] - ( { W_NAME, (ULONG)"TitteName" ) }; 
Contact your compiler vendor for a solution if necessary. 

Unfortunately, there is no consistent naming convention for 
the tag-list routines. In most cases, the tag-list parameter ver- 
sion has a capital A on the end of the name. For example: 

SetAttrsA(object, taglist); 
SetAttrs(object, lag, val, ... ); 

Consult the include files and AutoDocs for the function you 
want to call to find the right name. 

CONVERTING TO TAGS 

All of the advantages of tags are irrelevant if you can't 
translate your favorite routines to use them. Consider the ex- 
ample of a simple OpenScreen() call. Here's how you would 
have coded it before 2.0: 

struct NewScreen NewScreen = 

{ 

0, 0, f Left and top edges V 
320,200, f Width and Height •/ 
2, I* Depth */ 
0,1, /' Detail and Block Pens 7 
NULL, /* Display Modes 7 
CUSTOMSCREEN, r Screen Type 7 
NULL, r No special font V 
"My Screen", /* Screen title */ 
NULL, f No screen Gadgets 7 
NULL f No custom Bitmap V 

}; 



lots of code here 



screen = OpenScreen(&NewScreen); 



With the 2.0 OpenScreenTagsQ, you can code it as: 

screen ■ OpenScreenTags( NULL, 

SA_Wldth, 320, 

SA_Height, 200, 

SA_Dep1h, 2, 

SA_Title, (ULONG)"My Screen", 

SA Type, CUSTOMSCREEN, 

TAG_DONE, 0); 

You obtain the values for the tags by reading intuition/ 
screens.h (or the AutoDocs for OpenScreenTagListQ). You 
must know what the tag value names are to use, in order for 
the system routines to understand what you are asking for. 
Note that you do not have to specify a BitMap, display 
modes, or even the left and top edges, because Intuition will 
use an appropriate default when it does not find an appro- 
priate entry in the tag list. 

Another useful tag-list function is SystemQ. While it used 
to be difficult to execute a command and gain control of its 
output and how it ran, with SystemQ and a few tag-list entries, 
you can do almost anything. The include file dos/dostags.h 
defines everything you can ask of DOS. 

TAKE A SECOND LOOK 

Now that you have seen a bit of how simple tag usage is, 
note that you must be careful which tag values you are us- 
ing. Because each library in the system restarts numbering its 
tags at TAG_USER, you can encounter many situations in 
which the same value means different things. For example, 
a tag value of 0x80000021 indicates SYS^INPUT to dos.li- 
brary tag routines, while it means SA_LEFT to Intuition rou- 
tines. A mistaken value can be very hard to debug, so be 
careful when coding. 

The other thing to think about is interpreting the data for 
the tag items. For some tag values, the data is simply a 
boolean value (such as SYSJNPUT), while for others it is a 
numeric value (such as SA_LEFT). You have tags that are for 
pointers (SA_T1TLE) and even those that are for bit flags 
(SAJType). Because the compiler does no type checking on 
these tag lists, you should take care to match the right type 
of parameter to the tag. 

ADVANCED TAG MANIPULATION 

Utility. library provides several additional functions for ma- 
nipulating tag lists. A couple — such as NextTagItem() and 
CloneTagItems() — you may never use unless you get very 
heavily into manipulating tags, but the following three can 
make life very easy: 

PackBoolTags(flags, TagList, BoolTagMap); 
FlIterTagLlstfTagLlst, TagArray, LOGIC); 
MapTags(TagList, MapList, IncludeMlssFlag); 

PackBoolTags() allows you to construct a bitmask of data 
values based on both a user tag list and predefined defaults. 
Let's take an example of a window system and a few tags: 

#deflne T BORDER TAGJJSER+1 
#detine T_SIZING TAG_USER+2 
#deflr»e T.CLOSE TAG_USER+3 
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#define TJDRAGBAR TAG USER+4 
Sdeflne T.ZOOMBOX TAG_USER+5 
#define T.DEPTHBOX TAG_USER+6 
fldefine T_TITLE TAG_USER+7 

You can define a correspondence for these bitmasks with 
(you guessed it) a TagArray: 

struct Tagltem maskmap[] = { 
( T.BORDER, 0x0001 }, 
{T.SIZING, 0x0002}, 
{ T CLOSE, 0x0004 }, 
{T_DRAGBAR, 0x0008}, 
(T_ZOOMBOX, 0x0010}, 
(T_DEPTHBOX, 0x0020}, 
{TAG_DONE, } }; 

Now, if you implement a routine similar to an OpenWin- 
dow() routine that takes a tag list to specify the attributes of 
the window desired, you can use this maskmap tag list to 
build a mask of the requested features in a single pass. You can 
also start out with a default set of features (such as a DRAG- 
BAR and a SIZING gadget) and allow the user to disable it if 
desired. If the routine were called with a tag list such as: 

struct Tagltem example! ] = { 
{T_BORDER,TRUE}, 
{T_DRAGBAR, FALSE}, 
I T.TITLE, (ULONG)"Tltle" }, 
{T^CLOSE, TRUE}, 
{T_DEPTHBOX, FALSE}, 
{TAG_DONE, } }; 

and you call: 

final = PackBoolTags( OxOOOA, example, maskmap); 

final is set to a value of 0x0007 requesting a BORDER, SIZ- 
ING and CLOSE box, but not a DRAGBAR, ZOOMBOX, or 
DEPTHBOX. To see, take a look at how PackBoolTags works: 

1. It starts with the OxOOOA passed in (T_SLZINGIT_ 
DRAGBAR). 

2. It goes to the first entry in example and finds T_BOR- 
DER. It searches maskmap and finds it has a value of 0x0001. 
Because the data for T_BORDER i n the example is TRUE, it 
combines 0x0001 into OxOOOA via OR, resulting in OxOOOB 
(T_SIZING ! T_BORDER I T_DRAGBAR). 

3. It proceeds to the next entry, T_DRAGBAR. Looking it up 
in maskmap produces 0x0008. Because the example specifies 
it as false, it combines the current value with the complement 
via AND and produces 0x0003 (T_BORDER+T_SIZING). 

4. It steps to the next entry, T_TITLE. Because it does not 
find T_TITLE in maskmap, it just ignores T_TITLE and keeps 
the value of 0x0003. 

5. The next entry is T_CLOSE. Maskmap gives it 0x0004, 
which is added in with OR, because the example specified it 
as TRUE. Our value is now 0x0007 (T_BORDER I T_SIZ- 
INGIT_CLOSE). 

6. Next, maskmap yields 0x0020 for T_DEPTHBOX. The 
complement of 0x0020 is combined with 0x0007 via AND, be- 
cause the example specified it as FALSE. Because the bit was 
not on to begin with, the value remains the same. 

7. The last entry is TAG_DONE. PackBoolTagsQ has done 
its work so it quits. The extremely nice aspect of this inter- 
face is that you can change the defaults and even the inter- 
nal bit ordering without having to change a single line of 
user code. The code continues to request features and the 



target routine can optimize it into its own bit fields. 

The other routines, FilterTagList() and MapTags(), are use- 
ful for changing tag lists around when moving from one set 
of tag values to another. FilterTagList{) changes all located 
tags to a TAGJGNORE. For example, if you want to elimi- 
nate all SC_VAL1 and SC_VAL2 tag entries from a tag list use: 

Tag fllterllBta = { SC_VAL1, SC_VAL2, TAG_DONE }; 

Fi!terTagList(tagllst, fllterllst, TAGFILTER_AND); 

Note that although filterlist is terminated by a TAG_ 
DONE, it is not a tag list. It is simply an array of tag values 
terminated by a TAG_DONE. After the operation, all occur- 
rences of SC_VAL1 and SC_VAL2 entries in the list will be 
set to TAGJGNORE. You can do the opposite — filter out ev- 
erything but the ones specified with: 

FilterTagLlst(tagll»t, fllterllst, TAGFILTER_NOT); 

For a little more sophisticated filtering of tags, you can use 
MapTagsQ. Like FilterTagListQ, it applies mass changes to a 
list, but instead of putting in TAGJGNORE, it allows you to 
replace the tags with anything. For example, consider: 

struct Tagltem list[] = { 
{ MY_S1ZE, 71 }, 
{MY_WEIGHT, 200}, 
{ TAG^END, ) }; 

and a mapping tag list of 

struct Tagltem map[] = { 
{MY_SIZE, HIS_TALL}, 
{ TAG_END, } }; 

where MY_SIZE might be a form of height in my set of rou- 
tines, while another set of routines could expect a HIS JT ALL 
tag for the same information. Instead of making the user 
replicate the information in the tag list, you can change it on 
the fly. If you call MapTags(list, map, 0), list becomes: 

{ HISJTALL, 71 } 

{ TAG .IGNORE, 200 } 
{ TAG_END, } 

If, instead, you call MapTags(list, map, 1), it leaves MY_ 
WEIGHT in to give you: 

{ HISJTALL, 71 } 
{MY_WEIGHT, 200} 
{ TAG_END, } 

Note that, for safety reasons, attempting to map some tag 
value to TAG_END will result in the MapTags() routine sub- 
stituting TAGJGNORE. 

WRAPPING IT UP 

Tags can be very powerful and can make your code quite 
a bit simpler in the long run. In particular, the introduction 
of tags ensures that your code will continue to function with 
newer releases of the operating system that add more fea- 
tures, while still allowing for expansion. As long as you keep 
track of the tag type and the type of values they take, you 
should have little trouble using them. ■ 

John Toebes is coordinator for The Software Distillery. He was 
a major developer of the SAS/C system for AmigaDOS. Contact 
him c/o The AmigaWorld Tech Journal, 80 Elm St., Peterbor- 
ough, NH 03458, or on CompuServe (72230,303). 
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GRAPHICS HANDLER 



ON DISK 



\V> 2.0's Graphics.Library 



By Spencer Shanson 



THE SURFACE DIFFERENCES between 2.0 and previous 
versions of the operating system jump at you the moment 
you boot your Amiga. Gone are the old blue-orange-black- 
white color scheme and flat design; in their places are re- 
freshing greys and blues, drop shadows, new gadgets, and a 
generally much more professional look. Beneath the Intu- 
ition and Workbench facelift lie the more exciting differ- 
ences — the changes to the graphics. library, which houses the 
code that makes all the colors, displays, lines, and icons avail- 
able, shoves it onto your screen, and no sooner than one 
screen is done makes the next '/% of a second later. Although 
average users are sheltered from it, programmers will ap- 
preciate how much work has gone into the graphics, 
library for 2.0. (Note that I will refer to 2.0 as V37 from now 
on, in line with 1.3 being V35, and so on. V36 was the alpha 
and early beta version of 2.0. When opening a system library, 
you should use V36 if you know a feature is available in V36, 
otherwise use V37. There are very few features in V37 that 
were not in V36.) 

CHIPS (BUT NO FISH) 

The main purpose of the 2.0 graphics. library was to pro- 
vide support for Commodore's Enhanced Chip Set (ECS). 
The ECS consists of new versions of the display chip (Denise) 
and the graphics and DMA engine (Agnus). What does the 
ECS give you that the previous chip revisions did not? 

Agnus can now work in PAL or NTSC mode. A jumper on 
the A2000 or a track on an A500 configures your machine to 
be PAL or NTSC by default, but this can be overriden by the 
V37 software. The ECS also provides a new display mode 
called SuperHires, which gives twice the resolution of a hi- 
res screen (1280 pixels wide standard). Because of DMA re- 
strictions, however, you can run this mode with a maximum 
of only two bitplanes (four colors). As well as providing tim- 
ings for NTSC/PAL and SuperHires, the ECS Denise can also 
be programmed with variable beam rates, meaning Denise 
can provide a VGA-like mode (640x480, noninterlaced) called 
Productivity mode on a multisync monitor. Variable beam 
rates also need ECS Agnus for DMA timing. Again, DMA re- 
stricts Productivity mode to two bitplanes. 

All this is very nice, but ECS Agnus also takes a positive 
step towards alleviating the regular gripe of "only having 
512K of chip RAM," because it can support one megabyte of 
chip RAM (or two megabytes on the A3000). You can now 
have larger pictures, animations, and sound samples in that 
precious chip RAM of yours. To utilize the extended memo- 
ry, Agnus has increased its blitting range from 1008x1024 
pixels to 32Kx32K pixels! As a given, this also extends its line- 
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drawing capability to lines that are 32K pixels long. 

Denise is responsible for defining the display window on 
your monitor. (The display window is the area on a screen in 
which all the action happens.) Denise's job is to take the dis- 
play data from Agnus, mix it with the sprite data, prioritize 
the sprites and playfields, convert palette numbers to Red, 
Green, and Blue values, and shove the result out of the RGB 
port to your monitor. If that's not enough, it also provides 
genlock support. 

Under 1.3 and earlier, Denise was restricted in display- 
window size. Horizontally, the display window had to start 
in the left */« of the display and finish in the right '/*. Verti- 
cally, the display had to start in the upper : /\ and finish in the 
lower '/;. These restrictions were imposed because the re- 
quired Most Significant Bits were nonprogrammable, and 
were hardcoded in silicon. In the ECS, a new programmable 
register was added to supersede the hardcoded values — so 
poof go those restrictions! 

Probably the most interesting features of the ECS Denise, es- 
pecially for the video world, are its new genlock modes. The 
old Denise set a pin on the RGB port called ZD (Zero Detect) 
when the chip came across a pixel that was set to color (the 
background color). An external genlock could watch this pin, 
and when the pin was asserted, it could output a video signal 
(from a VCR, video camera, or whatever) instead of the Ami- 
ga's regular signal. Thaf s how genlocks work. But just gen- 
locking on the background color was not always enough for 
some people. They wanted to be able to genlock on any color 
on the screen (called chromakeying), and some more expen- 
sive genlocks provided this feature themselves. The ECS 
Denise can now do that as well, plus it provides bitplane over- 
lay, so an entire bitplane controls genlocking (to provide those 
keyhole-type effects), and BorderBlanking creates a transpar- 
ent "frame" surrounding the active area. 

THE GRAPHIC DATABASE 

"So," you say, "that's what the ECS hardware does. How 
do I make my application software use these new functions?" 

As before, you can determine the amount of chip RAM 
you have available by calling: 

AvailMem(MEMF_CHIP) 

Your code should be able to handle the (possible) new size. 
The blit sizes are a different problem. Following the tradi- 
tion of assumed values ("blits are always lKxlK"), we could 
easily suppose "if a machine has an ECS Agnus, we can blit 
32Kx32K." We could, but we shouldn't. What if Commodore 
releases another version of Agnus that supports even larger 
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monitors, and display for every known display mode. " 
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blits. Are you supposed to use another set of hard coded val- 
ues? To alleviate this problem (and many others), V37 graph- 
ics.library has a graphics database that contains information 
on dimensions, monitors, and display for every known dis- 
play mode. 

Prior to V37, you set display modes in the ViewPort-> 
Modes field, which was 16 bits wide (one WORD). This field 
was getting fairly cramped, especially with the need for the 
new modes (SUPERHIRES) and more monitor types (PAL, 
NTSC, VGA). Therefore, every display mode in V37 has an 
associated 32-bit (one LONG) DisplaylD value. The upper 
WORD holds the monitor type number, while the lower 
WORD defines the monitor's various modes. For compati- 
bility with existing software, if the upper word of the LONG 
DisplaylD is 0, then the "default" monitor (NTSC or PAL, de- 
pending on your Amiga's configuration) is assumed. The 
monitor numbers and the DisplaylD values are defined in a 
new header file, graphics/displayinfo.h. 

You can find the DisplaylD of a ViewPort using the new 
V37 graphics function: 

ULONG GetVPModelD(struct ViewPort *vp). 

To be compatible with future revisions of the OS, you 
should use this function from now on, rather than using 
ViewPort->Modes. In addition, be sure to check the return 
value against INVALIDED (in graphics/displayinfo.h) for 
errors. 

When you have this ID, you can use it to read information 
from the database. Four types of information are available; — 
Monitor, Dimension, Name, and Display — and each contains 
information similar to the following sample (which is for the 
NTSC hi-res interlaced mode): 

— Mode 0x00019004*" 
Name : NTSC:Hires-lnterlaced 

Monitorlnfo 

{ 

ViewPosltiort = (0x73, 0x2C) = (1 1 5, 44) 
ViewResolution = (0x2C, 0x34) = (44, 52) (ticks per pixel) 
ViewPositionRange Rectangle (fixed, hardware dependent) = 

(0x5D, 0x1 5) - (0x88, 0x3F) 

(93, 21) -(136, 63) 
TotalRows = 0x1 06 = 262 

TolalColorClocks = 0xE2 = 226 
MinRow =0x15 = 21 

Compatibility = 0x0 = MCOMPAT.MIXED 

(can share display with other 



MCOMPAT MIXED modes) 
J 

Dimensionlnfo 

( 

MaxDepth a 0x4 /* The number of bitplanes supported */ 

MinRasterWidth = 0x20 = 32 pixels 

MinRasterHeight = 0x1 = 1 pixels 

Max Raster Width = 0x3FF0 a 16368 pixels /* determined by Agnus type V 

MaxRasterHeight = 0x4000 = 16364 pixels /" determined by Agnus type V 

Nominal Rectangle (Standard Dimensions) = 

(0x0, 0x0) - (0x27F, 0x1 BF) = 

(0, 0) - (639, 399) 
MaxOScan Rectangle (fixed, hardware dependent) = 

(0xFFFFFFD4, 0xFFFFFFD2) - (0x2A7, 0x1 B3) = 

(-44, -46) - (679, 435) 
VideoOScan Rectangle (fixed, hardware dependent) = 

(0XFFFFFFD4, 0XFFFFFFD2) - (0x2B3, 0x1 B3) = 

(-44, -46) -(691 ,435) 
TxtOScan Rectangle (Editable via Preferences) = 

(0x0, 0x0) - (0x2 A7, 0x1 8F) = 

(0,0) -(679, 399) 
StdOScan Rectangle (Editable via Preferences) = 
(OxFFFFFFFB, 0x0) - (0x2A7, 0x1 8F) = 

(-8, 0) - (679, 399) 
} 

Dlsplaylnfo 

{ 

NotAvailable = 0x0 /* if non-zero, this mode may not be 

* available if, for example, this mode 

* requires ECS chips, and the Amiga 

" does not have them. You can determine 
' a mode's availability with the V37 

* ModeNotAvailable() function 
•I 

PropertyFlags = 0xBC1 

LACE SPRITES GENLOCK WB DRAGGABLE BEAMSYNC 
Resolution = (0x1 6, 0x1 A) = (22, 26) (ticks per pixel) 

PlxelSpeed = 0x46ns = 70ns 

NumStdSpriles = 8 
PaletteRange = Ox 1 000 = 4096 

SpriteResolution = (0x2C, 0x34) = (44, 52) (sprite ticks per pixel) 
} 

As you can see, the information for each possible mode is 
fairly extensive. (Note: V37 now provides OS support for 
overscan.) You can get the database via a DisplaylnfoHandle. 
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First, find the handle: 

handle = Finc!DisplayInfo{ULONG DisplaylD). 
Now, pass it to: 

ULONG GetDisplaylnfoData(DlsplaylnfoHandle handle, UBYTE "buf, 
ULONG size, ULONG tag ID, [ULONG Id]). 

This takes the handle, a pointer to a buffer, the number of 
bytes to copy from the database into your buffer, and an iden- 
tifier for the type of information you need. Alternatively, you 
can pass NULL as the handle with a DisplaylD value. To it- 
erate through the entire database of known modes use: 

ULONG NextDisplayld(ULONG DisplaylD) 

As an example of how all this works together, the follow- 
ing code prints the maximum depth (bitplanes) supported by 
each known display mode: 

#include >;graphles/gfxbase.h> 
^include <graphics/displayinfo.h> 

struct Gf xBase 'GfxBase; 

void main() 
{ 

APTR but; 

ULONG ID = INVALID ID; 

il (GfxBase = Open Llbrary("graphics.ll bra ry", 36)) 
!• V36 is the early KS2.0 V 



< 



if (buf = AllocMem(MAX(slzeaf(struct Dimensioning), sizeof 
(struct Namelnfo)), MEMF_CLEAR)) 

{ 

while ((ID = NextDlsplaylnfo(ID)) 1= INVALID. ID) 

{ 

/' Iterate through each known ID V 

prtnrfflD Ox%lx ", ID); 

If (GetDlsplaylnfoData(NULL, buf, sizeof (struct Namelnfo), 
DTAG_NAME, ID)) 

{ 

prtntf("Name - %s, ", ((struct Namelnfo *)buf)->Name); 

) 

II (GetDlsplaylnfoData(NULL, buf, sizeof(struct Dimension- 
Info), DTAG_DIMS, ID)) 

{ 

printf("MaxDepth = %ld\n", ((struct Dfmensionlnfo *) 
buf)->MaxDepth); 
} 
) 

FreeMem(buf, MAX(sizeof(struct Dimensioning), sizeof(struct 
Namelnfo))); 



} 

Close Libra ry(GfxBase); 



GENLOCKS AND KEYHOLES 



All the new genlock information is stored in a ColorMap 
structure, and can be controlled via the new VideoConrrolQ 
function. Because the ColorMap has grown and future ver- 
sions are also likely to grow, you should be getting and free- 



ing your ColorMap structure with the GetColorMapQ and 
FreeColorMapO functions. These graphics functions know 
exactly how big the ColorMap structure is for the given Kick- 
start version number. Therefore, you are guaranteed to have 
a valid up-to-date ColorMap structure. A ColorMap is asso- 
ciated with a ViewPort, so you can use VideoConrrolQ to al- 
ter the genlock mode on a ViewPort-by-ViewPort basis. 

VideoConrrolQ takes a ColorMap structure and a pointer 
to a TagList: 

ULONG VtdeoControl( struct ColorMap "cm, struct Tagltem *ti); 

The TagList is a list of instructions for VideoConrrolQ to 
perform on the ColorMap. Some of these instructions require 
extra data, and some are Boolean. After receiving the TagList, 
VideoConrrolQ can alter it. As a demonstration, the follow- 
ing routine determines if the genlock BorderBlank feature is 
enabled for the current ColorMap, and enables ChromaKey 
for color 3. All the tags are defined in graphics /videocon- 
trol.h. (For a general discussion of tags, see "Digging Deep 
in the OS," p. 8.) 

ULONG GenlockStuff(struct ColorMap *cm) 
{ 

#define TAG COUNT 3 f # of Instructions to pass to VideoControl() 7 

struct Tagltem *ti; 

ULONG result = 1; 

If (tl = (struct Tag!tem*)AllocMem((slzeof(struct Tagltem) * 
TAG_COUNT), 0)) 

{ 

ti[0].tl Tag = VTAG_BORDERBLANK_GET; tl[0].ti_Data = NULL; 

tl[1].tl_Tag = vTAG_CHROMA_PEN_SET; ti[1].ti_Data = 3; 

tl[2].tl_Tag = VTAG_END_CM; ti[2].tl_Data = NULL; 

/' shows end of TagList '/ 

/* now the TagList is set up, pass it to VideoControl(). A non- 
NULL return 

* value signifies an error -either a bad ColorMap 

* (of type pre-V36), of a bad tagllst 

* ti[0].Tag will be changed to either VTAG_BORDERBLANK_SET 
•or 

' VTAG_BORDERBI_ANK_CLR, depending on its setting. 
•/ 

if ((result = VldeoControl(cm, tl)) == NULL) 
{ 

prlntffBorderBlank Is %s\n", (tl[0].tLTag == VTAG BOR- 

DERBLANK.SET) ? "Set" : "Clear"); 

} 

else 

{ 

printf("VideoControl() errortn"); 

) 

FreeMem(tl, sizeof(struct Tagltem) " TAG_COUNT); 

1 
else 

{ 

printf( "Could not allocate memoryW); 
} 

return(resuft); 
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You can control other aspects of the ColorMap {and hence 
the ViewPort) with VideoControlQ, as well. For example, as 
anyone who has ever used UserCopperLists knows, User- 
CopperLists "leak" through into other screens as the screens 
are dragged around. Look at PhotonPaint for an example: Put 
PhotonPaint in back of all the other screens, and then drag 
one of the frontmost screens down, so that the top of the Pho- 
tonPaint screen shows. With VideoControl(), you can now 
turn on "UserCopperList clipping" and stop the leaking. 

OVERSCAN— WALL-TO-WALL VIDEO 

You can also set a ViewPort's DisplayClip with VideoCon- 
trolQ. A DisplayClip is the total area of a ViewPort and can de- 
fine regions greater than the standard area. The effect of this 
capability is known as overscan. Overscan is used by many ap- 
plications, notably paint packages, to increase the working 
area. It is also useful when sending the output of your Amiga 
to videotape to reduce the blank area around the View, giving 
that wall-to-wall effect. Before V37, 
however, applications provided over- 
scan through a variety of unfriendly 
tricks, because the OS offered no direct 
support. Some of those tricks still 
work under V37, but there is no ex- 
cuse for using them anymore. Make it 
easy on yourself and other program- 
mers^ — play by the 2,0 rules. 

To allow for the DisplayClip, the 
ViewPort structure had to be extend- 
ed. This was no mean trick. The 
ViewPort cannot be physically in- 
creased in size because Intuition has 
an instance of a ViewPort structure in 
its Screen structure — extending it 
would mean changing the offsets of 
much data in the Screen and would 
break just about every piece of soft- 
ware ever published for the Amiga! 
So, a ViewPortExtra structure was 
created and is "magically" linked to its associated Viewport 
by the OS. 

To get a ViewPortExtra structure, call the V37 function 
GfxNew(): 

struct ExtendcdModo "G(xNew(ULONG node typa); 
For example: 

struct ViewPortExtra *vpe = (struct ViewPortExtra *)GfxNew(VIEW- 
PORT_EXTRA_TYPE); 

The ViewPortExtra structure (like ViewExtra) is headed 
by an ExtendedNode, hence the type cast. These are defined 
in graphics/gfxnodes.h. 

Because ViewPortExtra may grow in the future, you must 
use the GfxNewQ function to allocate one of these structures 
to ensure that your software is compatible with future OS ver- 
sions. When you finish with the structure, you should return 
the memory used to the system with GfxPree(): 

void GfxFree(struct ExtendedNode "on); 

So, now that you have a ViewPortExtra structure, you can 
define a DisplayClip. This is simply a rectangle, defined by 
the top-left and bottom-right comers, using the units of the 
ViewPort's mode. In other words, the units are in LORES 



pixels for a LORES ViewPort, HIRES pixels for a HIRES 
ViewPort, and so on. The origin of the rectangle is the posi- 
tion of graphics' View. 

If you take another look at the features of the database, you 
will see that Dimensionlnfo has five types of rectangles de- 
fined: Nominal, MaxOScan, VideoOScan, TxtOScan, and 
StdOScan. Nominal is the standard DisplayClip for this 
mode, such as 320x200 for a LORES noninterlaced ViewPort. 
MaxOScan is the maximum DisplayClip the software will 
handle, while VideoOScan is the absolute maximum the 
hardware can handle. TxtOScan is the DisplayClip in which 
all text rendered will be visible. StdOScan is the region that 
extends to the bezel of your monitor. You can alter both Tx- 
tOScan and StdOScan from the Overscan preferences editor 
on Workbench 2.0. 

If you want your application to open with the user's Txt- 
OScan for a HIRES ViewPort, first query you the database. 
Then copy the TxtOScan rectangle that was copied into the 
buffer you passed in order to GetDis- 
playInfoData() into the ViewPortEx- 
tra->DispIayClip rectangle. Next, you 
associate this ViewPortExtra with its 
ViewPort, and repeat the process us- 
ing VideoControlQ with the 
VTAG_VIEWPORTEXTRA_SET tag. 
Note that you do not have to use any 
of the defined DisplayClips; you can 
create any you like, but anything larg- 
er than MaxOScan is perilous! 



"The Dimensionlnfo 

has five types of 

rectangles defined: 

Nominal, MaxOScan, 

VideoOScan, TxtOScan, and 



StdOScan 



COPPER? I HARDLY EVEN 
KNOW 'ER 

The Copper (coprocessor) is a sim- 
ple processor that understands only 
three instructions: MOVE, WATT, and 
SKIP. It is able to WAIT until the 
video beam that sweeps across your 
display has reached at least a certain 
position, and then MOVE data into 
the custom chip registers that make the Amiga different from 
all other computers, such as the bitplane pointers and regis- 
ters that control the display mode. It is this processor that al- 
lows the Amiga to show different display modes on-screen 
at the same time (when you slide screens around). Just like 
any other processor, it executes a list of instructions. 

Many games use their own Copper lists for display tricks, 
but some like to "borrow" and corrupt graphics' Copper lists. 
Most of the time they get away with this, but there are some 
games that make mistaken assumptions about graphics' Cop- 
per lists. The authors of these programs assumed that, be- 
cause the format of the Copper lists did not change between 
early versions of the OS, the lists would not change in the fu- 
ture. They were wrong! 

The format of the Copper lists has changed considerably 
between V35 and V37. Software that expects certain Copper 
instructions always to be at certain offsets from the start of 
the list breaks under V37. Other programs assume that the 
Copper lists always load some chip registers with the same 
data value and therefore never load those registers. These 
programs also break. 

Do not assume anything about the Copper lists. They have 
changed in the past and are likely to change in the future. 
Note well the comment in graphics /copper.h: 
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I* private graphics data structure 7 

If you need to play with the system's Copper lists, install 
your own UserCopperLists using the graphics. library 
macros— CINIT, CMOVE, CWAIT, and CEND— defined in 
graphics/gfxmacros.h. Examples of their use are the Sliced- 
HAM (SHAM) and DynamicHiRes tricks that allow more 
colors on the screen simultaneously by changing the color 
palette on every line. 

TEXT (VERY FITTING!) 

Some new functions were added to graphics. library's Text 
module to help in fitting text into a defined area. Bear in 
mind that, under V37, the default font can be of any size (as 
set by the text preferences); you cannot assume anything 
about the font your application will be rendering in, unless 
you specify a particular font in your code. 

The problem with the old TextLengthQ function was that the 
value it returned was not the number of pixels with which the 
string was rendered, but the value that was added to the Rast- 
Port->cp_x value. If, for example, you rendered the text "Hel- 
lo" into a rastport, and you later wanted to delete that text, you 
could have used TextLength() to determine the number of pix- 
els in the string and deleted the text with that result. 

If the text was rendered in italics, however, the value re- 
turned from TextLengthQ (which is the value added to Rast- 
Port->cp_x) would have been shorter than the actual num- 
ber of pixels used to render the string. V37's new function, 
TextExtent(), finds the "bounding box" of a text string, giv- 
en the font's size and text attribute (bold, italic, underlined, 
and so on): 

void TextExtent(struet RastPort Tp, STRPTR string, WORD count, 
struct TextExtent *te); 

This function takes a RastPort, a pointer to a string, a count 
of the number of characters in the string, and a pointer to a 
TextExtent structure, which will be filled by the function for 
the result. The new TextExtent structure is defined in graph- 
ics/ text.h, and is filled in as follows (from Commodore's 
AutoDocs): 

te_ Width — Same as TextLengthO result: the rp_cp_x advance that 

rendering this text would cause. 

te Height— Same as tf_YSize. The height of the font. 

te Extent. MinX — The offset to the left side of the rectangle this would 

render into. Often 0. 

te_Extent.MinY — Same as -tf Baseline. The offset from the baseline to 

the top of the rectangle this would render into. 

te_Extent.MaxX — The offset of the right side of the rectangle this would 

render into. Often the same as te Width-"!. 

te_Extent.MaxY— Same as tf_YSize-tf_Baseline-1. The offset from the 

baseline to the bottom of the rectanangle this would render Into. 

Another feature missing in previous versions of the OS is 
a function to determine how many characters of a string 
would fit wholly in a defined area. The TextFit() routine cures 
that problem in V37. This function also fills a TextExtent 
structure: 

ULONG TextFit(struct RastPort 'rp, STRPTR string, UWORD strlength, 
struct TextExtent *te, struct TextExtent "constrainingje, 
WORD strdirection, UWORD constraining_BitWldth, 
UWORD constraining_BltHeight); 

Constraining_te is the text extent that the text must fit in. 



If this is NULL, then constraining_BitWidth/Height defines 
the text extent instead. Strdirection can be either 1 or -1. If 1, 
the string is anchored at the left side of the box. If -1, the 
string is anchored at the right side of the box, and the point- 
er to the string should point to the string's last character. 

The following crude fragment of code prints the first n let- 
ters of the alphabet that will fit in a box that is 200 pixels wide 
by 100 pixels high, given the rastport's font and style. Note 
that if the font is taller than 100 pixels, then no characters will 
fit in the box. 

void alphabet(struct RastPort *rp) 

{ 

((define STRING "abcdefghijklmnopqrstuvwxyz" 

((define LENGTH strlen(STRING) 

((define WIDTH 200 

((define HEIGHT 100 

struct TextExtent *te; 
ULONG count; 

If (te = AilocMem(sizeof(struct TextExtent), MEMF.CLEAR)) 

{ 

count = TextFit(rp, STRING, LENGTH, te, NULL, 1, WIDTH, 

HEIGHT); 

if (count) 

{ 

Move(rp,0, HEIGHT); 
Text(rp, STRING, count); 

1 

FreeMem(te, size obstruct TextExtent)); 



FONTS 

Before V37, if you tried to open a font and the size you want- 
ed was not available, then the open failed. Under V37 the font 
size specified will be created (scaled) from the closest available 
defined size. Therefore, your application should be prepared 
to handle fonts of any size. The routine that does the actual 
scaling is called BitMapScale() and can be used by your appli- 
cations. Look in graphics/scale.h for a description of the struc- 
ture this function uses. For an example of usage and details on 
known bugs and limitations, study the sample source code in 
the Shanson drawer of the accompanying disk. 

Finally, the TextFont structure needed to be extended un- 
der V37, Because many applications use embedded TextFont 
structures in their code, such as: 

struct TextFont myTextFont 

the structure could not be directly expanded without break- 
ing much existing software. So, whenever you call Open- 
Font(), the system creates a new TextFontExtension and mag- 
ically attaches it to the TextFont structure returned. The 
TextFontExtension is removed when you call CloseFont(). 
Some applications, however, use their own hardcoded fonts 
that are never opened with OpenFont(), but the system still 
creates the TextFontExtension when you call SetFontQ for 
that font. 

This creates a problem: The system has allocated memory 
for the new structure, but has no way of knowing when to 
return mat memory, because the font is never closed. Conse- 

Contimied on p. 63 
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Clean Up Your Programs 

Enforcer and Mungwall help yon find your errors 
before the public ever sees them. 



By Carolyn Scheppner 



HAVE YOU EXPERIENCED any of the following prob- 
lems with your own or another company's software? 

• The program runs well on your system, but other users re- 
port it has problems on their systems. 

- • The program runs well by itself but has problems running 
with or after other programs. 

• The program runs well most of the time but occasionally 
crashes or fails for no apparent reason. 

Thanks to two powerful new debugging tools — Enforcer 
and Mungwall — you should be encountering such problems 
less. When used correctly during product development and 
testing, these tools (found in the Scheppner drawer) catch 
the most common causes of these problems — the use of 
NULL pointers, uninitialized pointers, improperly initialized 
structures, improper memory usage, and overwritten mem- 
ory allocations. In fact, many companies now require that all 
of their in-house software pass Enforcer and Mungwall test- 
ing, and have also added this requirement to their contracts 
for outside development. 

HOW THEY CAN HELP 

Written by Bryce Nesbitt, Enforcer is an MMU-based de- 
bugging tool. An MMU is a memory management unit that 
can be configured to trap accesses to specified ranges of 
memory. The 68030 chip has a built-in MMU, and most 
68020 boards contain separate MMUs. Because it is MMU- 
based, Enforcer can trap reads and writes of low memory 
and nonexistent memory the instant these accesses (also 
known as "Enforcer hits") occur. This allows you to catch us- 
age of NULL pointers and some uninitialized pointers, and 
even accesses that would have trashed low memory or oth- 
erwise crashed the system. Some of these accesses (such as 
reads of address 0) may seem harmless on your system, but 
they could cause your program to fail in the field. If you are 
developing commercial software (or any software that you 
plan to distribute), it is extremely important that you invest 
in an MMU or, at the very least, make sure that your soft- 
ware is tested on machines with MMUs, Enforcer, and 
Mungwall. As more of the development community begins 
running these tools, software that is unusable in their pres- 
ence will be abandoned. 

Enforcer is even more powerful when used in combination 
with Mungwall, a combination memory munging tool by 
Ewout Walraven that is based on Bryce Nesbitt's Memmung 
and Randell Jesup's Memwall. The "mung" part of Mungwall 
fills all of free memory (and all subsequently freed memory) 

18 November/December 1991 



with nasty, odd 32-bit values, such as SDEADBEEF. These 
values are almost guaranteed to cause serious problems for 
any program that uses uninitialized pointers or structures, or 
uses memory or allocations after they are freed. Such usages 
can occur, for example, when allocations are not freed in the 
correct order. 

Mungwall uses specific nasty 32-bit values in its memory 
munging to help you diagnose any problems: 

• Except when Enforcer is running, location is set to 
SC0DEDBAD so that programs referencing location will 
not find a value. Programs referencing location as a string 
will get a string of high ASCII characters rather than a NULL 
string, and programs using NULL structure pointers should 
be irritated into crashing. When Enforcer is running, this is 
not necessary because, with location containing 0, Enforcer 
can trap these low-memory accesses by itself. 

• On startup, all free memory is munged with SABADCAFE. 
If this number shows up, someone is referencing memory in 
the free list. 

• Except when MEMF.CLEAR is set, memory is premunged 
on allocation with SDEADFOOD. When this appears in an 
Enforcer report, the caller is allocating memory and doesn't 
initialize it before using it. 

• Memory is filled with SDEADBEEF when it is deallocated, 
encouraging programs reusing freed memory to crash. 

The "wall" part of Mungwall allocates extra memory be- 
fore and after every memory allocation and fills this "wall" 
with a fill pattern and other information. On each dealloca- 
tion, Mungwall checks to make sure that the deallocation 
size matches the size of the allocation and that the walls have 
not been overwritten. Mungwall also watches for 0-size allo- 
cations, 0-size deallocations, and 0-address deallocations. In 
addition, Mungwall has an option to "snoop" and report on 
all memory allocations and deallocations for all tasks or spec- 
ified tasks. This can be useful when tracking down memory 
losses. You can then run the voluminous snoop output 
through the snoopstrip program, which will throw away all 
matching alloc/dealloc pairs. 

Mungwall may be used without Enforcer and on nonMMU 
machines. If you don't have an MMU, at least test with Mung- 
wall alone. If you are using uninitialized memory or memo- 
ry after it is freed, Mungwall should help your program to 
crash immediately (as it might crash on a user's machine 
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DEBUGGING 
ARRANGEMENTS 

Enforcer and Mungwall both out- 
put their debugging information to 
the serial port at the current baud- 
rate setting of your machine's serial 
hardware. After powerup, your seri- 
al hardware is set to 9600 baud, but 
you can modify this by bringing up 
a terminal package and setting a 

baud rate. The best debugging setup is to connect your Ami- 
ga via a null-modem serial cable to an Amiga or other com- 
puter running a terminal package with ASCII-capture capa- 
bility. Both Enforcer and Mungwall include CTRL-Gs in their 
output to generate a beep with most terminal packages, and 
the ASCII-capture capability will allow you to capture all se- 
rial debugging output to a file for examination. This is espe- 
cially useful when combined with serial kprintfQ (debug, lib) 
debugging statements in your code, such as: 

kprintffAbout to close window $%lx\n",win). 

A clean way to add conditional debugging statements to a 
C program is to use a MACRO such as D(bug)) by including 
lines similar to those below in the program. Set MYDEBUG 
to 1 to turn on debugging. Set bug to printf for printf de- 
bugging, to kprintf (and link with debug.Iib) for serial de- 
bugging, or to dprintf (and link with ddebug.lib) for parallel 
debugging. The D(bug()) macro is neater in your code be- 
cause it can be indented and you need not surround it with 
any #ifdef directives yourself. Just be careful, and remember 
to put two close parentheses before the semicolon at the end 
of each D(bug()) statement. 
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debug macros "* 
^define MYDEBUG 1 
void kprintf(UBYTE 'fmt,...); 
void dprintf(UBYTE 'fmt,...); 
#define DEBTIME 
#define bug printf 
#if MYDEBUG 

#define D(x) (x); if(DEBTIME>0) Delay(DEBTlME); 
#else 

tfdefine D(x) ; 
#endif/* MYDEBUG V 
/"""**** end of debug macros "***"*•*/ 



You could then use the macro as 
follows: 

win = OpenWindow(Smynewwin); 
D(bug("Qpened window at $%lx\n", win)); 

If you have only one machine, 
you can debug to a serial or paral- 
lel printer (with Enforcer.par and 
Mungwall.par). In a pinch, if you 
have a modem attached to your ma- 
chine you may have some success 
doing serial debugging to yourself. 
If you bring up a terminal package 
and set it to your modem's baud rate, your terminal program 
can capture serial debugging output. However, you may lose 
bytes, especially at low baud rates, if the debugging output 
is large. 

By using Enforcer and Mungwall while you are develop- 
ing your software, you can catch problems as soon as they are 
introduced and greatly cut down your debugging time. It is 
especially useful to place conditional remote debugging state- 
ments in your code as you write each routine, so you can 
quickly turn them on when a problem occurs. You will be 
able to pinpoint the troublespot easily when the kprintf (or 
dprintf) output is intermixed with the Enforcer or Mungwall 
output. The remote debugging commands kprintf and 
dprintf are available in the linker libraries debug.Iib (serial) 
and ddebug.lib (parallel), respectively. These linker libs are 
supplied with some compilers, on Commodore's Native De- 
veloper Update disks, and in the Scheppner drawer of the ac- 
companying disk. If you prefer, you can also use a source-lev- 
el or single-stepping debugger in combination with Enforcer 
and Mungwall when tracking down a problem, to single- 
step through your code until the hit occurs. 

A different low-level method of locating Enforcer hits is to 
disassemble program memory where the hit occurred (or, if 
the hit occurred in ROM, to try the nonROM addresses 
shown in the Enforcer stack dump line), then to match up this 
disassembly with your own code. When working in assem- 
bly, you can just compare the disassembly to your source. 
Otherwise, you can take the hex values of a sequence of po- 
sition-independent 68000 instructions near the hit (i.e., no 
addresses except for offsets and branches) and search for this 
binary partem in your object modules. If you find the pattern, 
do a mixed source and object disassembly of that object mod- 
ule (for example, with SAS' OMD you could OMD 
>ram:dump mymodule.o mymodule.c) and then look in the 
output for instructions matching those where the hit occurred. ■ 
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Debugging Tools 



Figure 1: Sample Enforcer Output 



Program Counter (approx.) s 343F4A Fault address = 14 

User stack pointer - 348734 DOS process address = 339590 

Data: DDDD0000 DDDD1100 DDDD2200 DDDD3300 DDDD4400 DDDD5500 DDDD6600 DDDD7700 

Addr: AAAA0Q00 AAAA1100 AAAA2200 AAAA3300 AAAA4400 AAAA5500 AAAA6SO0 00002E28 

StCk: 00210D70 00000FA0 00339F84 BBBBBBBB BBBBBBBB BBBBBBBB BBBBBBBB BBBBBBBB 

READ-WORD (---)(-)(-) SR=0008 SSW=0161 

Background CLI, "lawbreaker", Hunk #0, Offset $5A 



ENFORCER HIT EXPLANATIONS 

Enforcer gives you lots of valuable information to help de- 
bug your program's hits. Consider the sample Enforcer hit 
caused by a program called lawbreaker shown in Figure 1. 
Here is an explanation of the most important items: 

Program Counter: Tine memory address at which the pro- 
gram was executing instructions when the hit occurred. For 
some types of hits this often will be the address of the in- 
struction after the hit. Note that if your program passes a 
bad pointer or an improperly initialized structure to a system 
ROM routine, you may cause the ROM code to read or write 
to an illegal address. 

Fault Address: The address where the illegal access oc- 
curred. In this example, the illegal access occurred at address 
$14, and, as specified later in the debugging output, this ac- 
cess was a READ-WORD access. Therefore, the illegal mem- 
ory access was an attempt to read a WORD (two bytes) at ad- 
dress S14. Low-memory accesses are often caused by NULL 
pointers to structures. If, for example, your code or a ROM 
routine references a structure member at offset S20 and you 
provide or use a NULL structure pointer, Enforcer will pick 
up a hit at address $20. 

Register Dump: Shows the contents of the program's reg- 
isters and stack at the time of the hit. This information can 
help assembly programmers and programmers who like to 
debug at a low level. 

Access. Type: In this example, the access was a READ- 
WORD and probably accessed a WORD-sized structure 
member. A READ-BYTE access is generally caused by a bad 
string pointer, while a READ-LONG is usually caused by a 
bad pointer or a bad pointer within a structure. WRITE- 
WORD, WRITE-BYTE, and WRITE-LONG accesses indicate 
that you are causing memory to be trashed and can be caused 
by bad pointers or bad code. Occasionally you will see an IN- 
STRUCTION access of illegal memory, generally caused by 
trashed code, a trashed return address on your stack, or an 
invalid library base. 

Program Name and Hunk Offset: The program name is 
the name of the task or command that was executing when 
the hit occurred. If possible, Enforcer also provides a hunk 
offset to the program counter's reading if the hit occurred 
within the program's own code instructions. 

SAMPLE MUNGWALL OUTPUT 

Mungwall provides a similar volume of debugging details. 
Study the hits by a program called mungwalltest that are shown 
below. Following each hit I added an explanation in parenthe- 
ses. For reference, the arguments for memory functions are Al- 



locMem(size,type) and FreeMem(address,size). The A: and C: 
addresses are Mungwall's guesses at the address from which 
AtlocMemO was called. A: is the address for an assembler caller. 
C: is the address for a C caller, assuming a standard stub. Be- 
cause Mungwall is wedged into the memory allocation func- 
tions, it can only guess the caller's address based on what is 
pushed on the stack. The "at" address on the first line of a 
Mungwall hit is the task address of the caller. Note that Mung- 
wall has special code to prevent trapping the partial (wrong 
size) deallocations that are performed by layers.library. If any 
other debugging tools are also wedged into AllocMem() and 
FreeMemQ, Mungwall's A: and C: addresses may be thrown off 
by additional information pushed on the stack, and Mungwall 
will also be unable to screen out partial layers deallocations 
(which will show up as hits on your task's context). 

AllocMem(0x0,10000) attempted by mungwalltest' (at 0x339590) 

from A:0x35C03A C:0x35677E SP:0x35CFC0 
(tried to allocate bytes of memory) 

FreeMem(0xG,16) attempted by 'mungwalltest' (at 0x339590) 

from A:0x3SC068 C:0x3567C4 SP:0x35CFB8 
(tried to free memory with a NULL pointer) 

FreeMem(Ox33BD10,0) attempted by mungwalltest' (at 0x339590) 

from A:0x35C068 C:0x3567D4 SP:0x35CFB0 
(tried to free bytes of memory) 

Mis-aligned FreeMem(0x33BDl4,16) attempted by mungwalltest' (at 
0x339590) 

from A:0x35C068 C:0x3567E2 SP:0x35CFA8 
(deallocation address known incorrect because not aligned like alloc) 

Mismatched FreeMem size 141 

Original allocation: 16 bytes from A:0x35C03A C:0x3567A0 Task 

0x339590 

Testing with original size. 

(deallocation size does not match allocation size) 

19 byle(s) before allocation at 0x33BD10, size 16 were hit! 

>S: BBBBBBBB BBBBBBBB BB536572 6765616E 74277320 50857070 

65722000 

(program trashed bytes that precede its allocation) 

8 byte(s) after allocation at 0X33BD10, size 16 were hit! 

>S: 75622042 616E6400 BBBBBBBB BBBBBBBB BBBBBBBB 

BBBBBBBB BBBBBBBBB 

(program trashed bytes that follow its allocation) 

As you can see, Mungwall alone can catch a large variety 
of memory-related software problems. But one of the most 
important benefits of Mungwall is that by filling freed mem- 
ory with nasty 32-bit values, it can force subtle memory mis- 

Continued on p. 63 
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The DMI Resolver™ 
graphics co-processor board offers 
a new dimension in Amiga display capability. 
Shown above is an unretouched 8-bit display, illustrating 
the 1280x1024 resolution color work environment 
provided by the Resolver. The DMI Resolver 
boosts the display and graphics 
processing capabilities of all Amiga 
A2000 and A3000 series 
computers, under both 
AmigaDOS and UNIX operating 
systems. Not to be confused 
with a frame buffer or grabber, the 
Resolver is a lightning fast 60MHz 
graphics co-processor. 

Whatever your application - desktop publishing, 
presentation graphics, animation, 3D modeling, 
ray tracing, rendering, CAD - let the Resolver 
move you into a new realm of resolution and 
workstation quality display. 



DMI Resolver 

• 1280x1024 Resolution 
•8-bit Color Graphics 
• 16-million Color Palette 

•60MHz Processor 

• Programmable Resolution 




To! 
Call for 



Digital Micronics, Inc. 

5674 El Camino Real, Suite P 
Carlsbad, CA 92008 
(619) 431-8301 • FAX: (619) 931-8516 
more information and the dealer nearest you. 



Resolver is a trademark of Digital Micronics, Inc. 

Amiga. A2000, and A3000 are registered trademarks of Commodore-Amiga, Inc. 

UNiX is a registered trademark of AT&T 

Circle 6 on Reader Service card. 




Efficient 
Assembly Programming 



By Jamie Purdon 



WHEN DESIGNING AND and writing "commercial" 
code, the money is in getting the job done. This means de- 
veloping good, high-level design, coding plans, and sticking 
with a schedule (even when it's self-designed and imposed). 
The fun part is writing the low-level, hardware-banging code. 

You can program in a system-friendly manner, yet still 
"bang on the hardware." One way is to keep the Blitter re- 
served with OwnBlit() and DisownBlit{) for only extremely 
short periods of time. Another example is to limit your mouse 
handling (mousebutton and mousemove events) to IDCMP 
messages. 

Still, sometimes you need to do something nasty. For ex- 
ample, to plot brushstrokes much faster than Intuition will 
give mouse-position reports, you need to read the mouse 
hardware. In this case, be sure to take into account Intuition's 
ActiveWindow (be sure it's one of yours) and make sure that 
you have replied to all outstanding IDCMP messages. In oth- 
er words, make sure that the system has not queued up any 
input (messages) before you read from an input device. 

The general idea is to use high-level ROM routines and to 
write the fastest-executing assembly language routines. Us- 
ing the built-in libraries makes efficient use of a resource 
that's always present. It allows for very small code and high 
functionality. The ROM routines are free. Use them. 

Assembly language programming can be as efficient as 
that of most high-level languages. You can code good appli- 
cations, that execute somewhat slowly, by using the built-in 
routines whenever possible. Of course, you gain the most 
speed (at the expense of code size) when you "put the pedal 
to the metal" and use assembly language. 

Use assembly language for speed or not-provided-for (low- 
level) capabilities. For example, an application might contain 
a screenful of gadgets with custom imagery. Now, Intuition 
seems to take forever when displaying these gadgets, be- 
cause so many separate blits occur. Essentially, Intuition 
makes repeated calls to Drawlmage(), which uses graph- 
ics. library, which uses the Blitter. A workable solution is to 
use the complement mode for gadget highlighting and no 
imagery for the gadget, as far as Intuition cares. You can 
write custom code to draw the gadget imagery all at once. 
These routines will usually run quicker than Intuition — un- 
less, of course, you use repeated calls to Drawlmage(). You 
still get the high-level advantage of letting Intuition handle 
the gadget interaction, but add the low-level advantage of de- 
creased gadget-rendering time. 

COUNTING CYCLES 

Cycle counting is an optimization technique that lets you eke 
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the best performance out of your Amiga's CPU. Often te- 
dious, it involves looking up (or memorizing) the timing, in 
cycles, of every instruction. Cycles are the number of "ticks" 
of the system clock that each instruction takes. They are al- 
ways in multiples of two; the minimum cycle count is four. 
(Even a NOP:No Operation instruction takes four cycles to 
execute.) I'll go into more detail later, but I want to empha- 
size now that a stock Amiga can execute a theoretical maxi- 
mum of only 1,750,000 (7MHz/4) instructions per second. 
(Four cycles is the shortest cycle count.) Real-world timing 
averages are often much worse than this. When working 
with graphics, it is ideal to update the screen 60 times per 
second, however, there are usually not enough cycles in '/» 
of a second. 

Many people never get around to cycle counting when pro- 
gramming the Amiga. Besides that drudgery, the dearth of 
programming utilities that display individual instruction cy- 
cle timings makes it difficult. You usually end up consulting 
a paper reference. I use the MC68000 16-/32-bit Microprocessor 
Programming Reference Card published by Motorola. It has all 
the instructions on one side of a 872x25-inch fold-out card. 

68000 TIMINGS 

Assembly language is a natural for cycle counters because 
what you code is what you get. A 680x0 instruction divides 
into three parts: the opcode, the source address, and the desti- 
nation address. Either (or both) of the addresses can be CPU 
registers. Each instruction has many variations that differ in 
the operands used. Programmers control the operand selec- 
tion, and smart ones consider this choice when optimizing. 
The operands used directly affects how long it takes an in- 
struction to execute. 

There are some tricks to remembering cycle timings. Some 
instructions operate on only one address (or register) and 
have only two parts, an opcode and a combined source and 
destination register. Each part of an instruction takes a spe- 
cific number of cycles (ignoring cache considerations). Most 
instructions have many variations when you consider all the 
different addressing mode possibilities. It's easiest to devel- 
op some general rules for (address and) memory-access tim- 
ing. Each WORD-size (16-bit) memory access "costs" two cy- 
cles. LONG (32-bit) memory references cost four cycles (a 
LONG is two WORDs). It's noteworthy that these timings are 
based on memory access considerations (and are reduced 
when a cache is available). 

For the best memory-access timings, I recommend using 
registers because, they do not cost any extra cycles for 
WORD-size operations. Many LONG-size (register-to-regis- 
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ter) operations, however, cost an additional four cycles. For 
example, ADD.W dO,dl takes four cycles, while ADD.L dO,dl 
requires eight cycles. Essentially, the CPU is a 16-bit machine. 
It requires extra time to deal with LONGWORDs. The 68000 
is called a 16-/32-bit CPU because it is most efficient with 16- 
bit data, but the memory addressing architecture is set up for 
32-bit operations. 

Don't take these numbers for granted. Occasionally review 
the cycle counts of all instructions. This will help you plan 
code sequences with minimal cycle counts. You'll find that 
register-to-register MOVE instructions take the same number 
of cycles for both WORD and LONG operations. 

For efficiency, the Sec instructions come in handy when 
dealing with flag bytes. They act very much like the Bcc in- 
structions. Instead of branching, however, they set or clear a 
byte. This byte can be in memory or (the lowest byte in) a Dn 
data register. 

Another common trick is to use the MOVEM instructions 
wherever possible. Note that a MOVEM. W into an An ad- 
dress register will sign-extend the top WORD. Another way 
to think of this is that a MOVEM.W will destroy the upper 
WORD in an A register. 

After you've written some code and are optimizing it, look 
for substitute opcodes (with quicker timing and fewer cycle 
counts). An obvious substitution is replacing ADD # in- 
structions with ADDQ #. Another is to use: 

MOVEO fSO.Dn 

(four cycles) in place of: 

CLR.L Dn 

(six cycles). There are more subtle tricks you can use: ROXL.L 
(ten cycles) is often replaced by ADDX.L (eight cycles). 

SUB A An, An 

(eight cycles) is a common instruction for clearing an address 
register. If you have already cleared a data register, howev- 
er, you can replace the instruction with: 

MOVEA.L Dn.An 

(four cycles). If you would rather optimize for small code 
versus execution time (the classic trade-off), then a good in- 
struction to use is: 

MOVEM. L Zeros, Dn-n/An-n 

All this requires is that Zeros be defined as an array of n 
(equaling the maximum number of registers) zero-value 
LONGWORDs. It's a cycle hog because it costs eight cycles 



to clear each register (plus 12 cycles of instruction overhead). 
But this opcode requires only four or six bytes of code 
space — depending on the Zeros' addressing mode — for any 
number of registers to clear. 

ACCELERATED AMIGAS HAVE CACHES 

A beneficial habit is to design code and data structures 
that function very efficiently when executed on cache-bear- 
ing CPUs, but to use only opcodes that work on all CPUs 
(68000 instructions). To take advantage of newer CPUs with- 
out providing separate code for each CPU, you must take into 
account caches when unrolling loops and designing data 
structures. This provides good performance on a 68000 and 
better than expected improvements (if you consider only 
clock-speed differences) on newer CPUs. 

A cache is a piece of hardware that sits between the CPU 
and the memory. It acts as a very fast (but small) memory and 
a lookup table. The lookup table keeps track of the last n 
memory accesses. Cache memory has the benefit of allowing 
much quicker access to memory than is possible with regu- 
lar memory subsystems. Frequent accesses to the same mem- 
ory location are remembered by the cache and the data is 
available to the CPU much quicker than if an actual memo- 
ry card access occurred. This really helps on Amiga 3000s that 
are upgraded with older, 16-bit memory cards. 

The 68020 has a code cache, while the 68030 has both code 
and data caches. Note that register-based algorithms are 
still fastest (for data manipulation). The 68030's data cache 
will not improve performance for these instructions. The 
68020's code cache is 256 bytes arranged as 64 LONG- 
WORDs. The 68030's two caches are each 256 bytes long as 
well, but are organized differently. Each holds 16 groups of 
four LONGWORDs. 

DATA STRUCTURES 

To take advantage of the 68020 and 68030, group often- 
used variables within the same mod 16 address range. The 
68020 deals with memory in units of four bytes (one LONG- 
WORD). With the 68030's (very fast) burst mode enabled, 
data and instructions are fetched in groups of 16 bytes (four 
LONGWORDs). The cache is loaded with all four LONG- 
WORDs in the mod 16 grouping. (The hardware is optimized 
to fetch the desired LONGWORD first.) Grouping variables 
(and code sections) based on mod 16 grouping allows down- 
ward compatibility while taking advantage of the 68030. If 
you're optimizing strictly for the 68020, disregard data-cache 
considerations and stay with LONGWORD (as opposed to 
WORD) alignment for the code sections: The '020 fetches in 
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groups of LONGWORDs {the '030 default), but has no burst- 
mode facilities like the '030. 

Remember that, although 68020 upgrades are fairly inex- 
pensive (compared to 68030s and 68040s), the Amiga 3000 
comes with a 68030. You'll probably want to optimize for the 
68030 at some point. (It also comes with a math chip, but I di- 
gress...) 

Keep in mind that the '030 data cache does not work with 
chip RAM. Chip RAM addresses are never cached, because 
the cache can contain invalid data if any of the custom chips 
happen to change chip data. 

UNROLLING LOOPS 

Unrolled loops are much more readable when coded with 
macros. You specify a macro to represent the innermost loop 
code, then type the macro as many times as the loop is to ex- 
ecute. For example, if you want a loop to execute four times, 
you might code: 

moveq #4-1,d0 ;loop counter, '-1' for dbxx loop 
loop: 

INNERLOOPCODE ;macro 
dbf dOJoop 

This same loop, unrolled, would look like: 

INNERLOOPCODE ;macro 
INNERLOOPCODE jmacro 
INNERLOOPCODE ;macro 
INNERLOOPCODE imacro 

At the expense of extra code space, this executes much 
more quickly, because the dbf instruction is never executed. 
You must decide how you feel about the code-size versus ex- 
ecution-time trade-off. 

Loops that iterate for a multiple of two are easiest to un- 
roll. Many loops are not so easily unrolled. The loop counter 
is divided by the number of inner-loop iterations that are un- 
rolled. For example, if the above loop were to execute 320 
times, you could code it as: 

move.W #(320/4)-1 ,d0 ;loop counter, /4 ( Inner loop expanded) 
loop: 

INNERLOOPCODE ;macro 
INNERLOOPCODE ;macro 
INNERLOOPCODE ;macro 
INNERLOOPCODE ;macro 
dbf dO.loop 

This is still more efficient than an unrolled loop, because 
the dbf instruction executes only 'A as often. 

A good rule to follow is: Unroll loops only until the code 
size approaches 128 bytes. This allows repeatedly called 
subroutines to remain in the cache. Also, the calling code 
will most likely remain in the cache. Another reason to keep 
unrolled loops small is that a BRA instruction at the bottom 
of a loop can be the short form (BRA.S), which executes 
quicker. This short form requires a (signed) seven-bit (128- 
byte) offset. 

CPU AND MATH-CHIP CONSIDERATIONS 

Every CPU in the 680x0 family recognizes the basic 68000 
instruction set. Each newer CPU has new instructions that 
will not work on the earlier CPUs. I recommend that you ig- 
nore these instructions (unless you wish to write CPU-spe- 
cific code) until a very low-cost 680x0 machine is available. 



I don't recommed providing code that is optimized for a 
68030 (for example) and somewhat crippled on a 68000, un- 
less you wish to spend the time and effort to write and test 
code for different CPUs. 

Commodore's math libraries let you write just one piece of 
code that will run on all math chips. However, they have the 
penalty of subroutine overhead. They also cost extra pro- 
gramming and testing time. You can eliminate much of the 
overhead by using the CPU's integer math instructions. Ex- 
cept for certain high-range applications, such as CAD or ray 
tracing, CPU-based math is usually faster than that on a hard- 
ware math accelerator. I recommend using nonCPU-specific 
integer math whenever possible. 

WHAT ABOUT FLOATING-POINT MATH? 

I often use a combined fixed- and floating-point method to 
keep track of the number of binary fractional bits. Binary 
fractions are not so hard to deal with: They work according 
to the same rules as decimal fractions. You can add and sub- 
tract only numbers with the same number of fractional bits. 
You can multiply any two numbers, however, as long as you 
remember that the result contains n fractional bits — where n 
is the total of the number of fractional bits of both operands. 
You can add and remove fractional bits with one (shift-type) 
machine language instruction. 

Arguments in favor of (CPU) integer-based math hold true 
for the 68040. On board the CPU, the 68040 has a math chip 
that allows floating-point math functions to execute faster 
(in fewer cycles) than any outboard math chip. The integer 
functions are still faster. If you can get away with 16- or 32- 
bit precision, it's still faster to use integer (680x0-generic) 
functions. Plus, this offers the benefits of being CPU-com- 
patible and following my philosophy of coding for a 68000 
but making use of 680x0 CPU's "transparent" features. Also, 
in assembly language, integer math is a natural, it is common 
to the whole 680x0 family. 

One caveat with fixed- and floating-point fractions is to not 
spend a lot of cycles removing fraction bits. Beware that the 
shift instructions are relative cycle hogs: They typically take 
6+2*number_of_bits cycles. A good way to avoid this over- 
head is to remove bits (by shifting) only infrequently. An- 
other trick is to use the four-cycle add instructions for left 
shifts. 

The fixed- and floating-point method works very well with 
image processing applications. Most image-processing algo- 
rithms deal with eight-bit source operands: RGB colors are 
normally stored in eight bits per component, and blending, 
shading, and antialiasing algorithms usually deal with eight 
or fewer bits for the "blend factor." Even when you add up 
the bits, two eight-bit sources only (multiply to) yield a 16- 
bit number — which conveniently fits in a WORD sized result 
and is optimally a CPU register. Here's an example that 
blends two numbers in a 60/40 ratio: 

nbits set 8 

percent set 60 

altblend set <l«nbits) - (1 00-percent )/1 00 

mainblend set (1«nbits)-altblend 

mulu #mainblend,d0 ;A blend, result has fraction 

mulu #alt,d1 ;B blend 

add.W dl.dO ;(A*frac)+{B*(1-frac)) = result «nbits 

asr.W #nbits,d0 ; remove nbits = 60%d0 + 40%d1 .byte valid 
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How well does all this theory convert real-world exam- 
ples? Below (and in the Purdon drawer of the accompanying 
disk) are some examples of routines I use over and over in 
my projects. Let's examine how they work. 

The "CUSTOMIZE" comments refer to specific lines of 
code that are application dependent. You can use absolute- 
or address-register-relative (stack-pointer or base-page) ad- 
dressing, but I recommend a base-page setup. Base-page ad- 
dressing allows for WORD-size (16-bit) addressing, which is 
quicker than absolute addressing because, in terms of mem- 
ory cycles, only a WORD (not a LONGWORD) needs to be 
accessed when the CPU fetches an instruction. 

CHECK-FOR-MESSAGE MACRO 

A macro, the CHECK4MSG routine checks for a pending 
message. 1 use it within sections of code that I may want to 
abort without spending much time checking for an abort con- 
dition. The standard method is to simply check your message 
port's signal bit, but this tells you only if messages are pend- 
ing. Often you may not want to abort unless a specific mes- 
sage is detected. Standard methods do not tell how many 
messages are pending, nor provide any useful information 
about the types of pending messages. CHECK4MSG works 
differently: It checks for a message, removes it from the mes- 
sage port, and provides the ability to scan the incoming mes- 
sage list. Of all the built-in routines CHECK4MSG is closest 
to Exec's GetMsgO library call. Functionally it is slightly dif- 
ferent, but it offers the obvious advantage of quicker execu- 
tion, because it's a macro, not a subroutine call. 

Remember, subroutine calls generally take 34 cycles. The 
breakdown is: An RTS instruction takes 16 cycles, and a typ- 
ical call-subroutine instruction takes 18. (BSR takes 18 cycles, 
as does JSR _LVOwhatever(a6). JSR abs.l requires 20 cycles.) 
So, plan on spending at least 34 cycles every time you call a 
subroutine. The following code is so short that, if it were a 
subroutine, the 34-cycle overhead would represent a signifi- 
cant portion of the execution time. Take a look: 

CHECK4MSG: MACRO 

lea OnlyPortJBP^AO ;aO=msg port adr, CUSTOMIZE 
lea MP_MSGLIST(A0),A0 ;TOP of list 
cmp.l 8(A0),A0 ;beq if empty 
EN DM 

^example usage 
; CHECK4MSG 
; bne abort_have_a_msg 

While this is really a textbook example, I want to point out 
the options that it makes available. One of the neatest is the 
ability to prescan the message list, letting you scan for a 
cancel code without removing any messages from the list 
(very user-friendly). It also enables you to control the time 
at which a message is removed from the pending-message 
list, because you do the removing. You can remove one 
without affecting any of the other messages, which is handy 
when you want to get rid of a certain class of messages — 
such as mousemoves — without affecting rawkey or mouse- 
button events. 

If you implement prescanning of a message list, you will 
probably want to use Intuition's rawkey messages. Of course, 
you should also use console.device's RawKeyConvert() rou- 
tine to decipher the keypresses, making your program com- 
patible with the user's Preferences key map. There are times, 



however, when you may simply want to check for a keycode 
without calling RawKeyConvertQ. Certain keys — the func- 
tion keys, the spacebar, and the arrow keys — seem to retain 
the same codes, with all keymaps, on most Amiga models. 

If you do program for these specific rawkey codes, be 
warned that your application will not be CDTV_ friendly. 
CDTV lacks a keyboard, and the CDTV remote control's ar- 
row keys function as move-the-mouse keys. They don't nec- 
essarily return "nice and programmer-friendly" arrow (raw) 
key events. 

An acceptable method is checking your message list ahead 
of time (prescanning) and removing messages from a message 
port before they are called by GetMsgO . This is based on the 
idea that once a message is processed with PutMsgQ it is the 
property of the task that owns the message port. Plus, it has 
the added advantage of removing the drudgery of maintain- 
ing a list of incoming events from the application. Somehow 
this has to be performed if you want to respond to each in- 
coming message in the order that the user generated them. 
(The messages might be a list of many menu-equivalents or 
gadget events.) Exec does this for you, automatically. 

Because you own the message list on your ports, you may 
delete individual messages without actually calling Get- 
MsgO; simply use Exec's REMOVE macro. REMOVE can sub- 
stitute for GetMsgO, but you must be sure to still use Reply- 
Msg() for each one. 

Another nice thing about messages and using only one 
port for all of them (a design trick) is that you maintain syn- 
chronicity — you can respond to events in the same order as 
they occur. Signals have no such capability. Signals tell you 
only if something happened since you last cleared the sig- 
nal. If multiple signals occur, you have no way of knowing 
the order in which they occurred. A signal is simply a bit in 
your task structure. It will not tell you if something has hap- 
pened more than once. It will only tell you if the fact that 
something has happened. When using message port sig- 
nals, you never know if more than one message has been re- 
ceived. 

FREEONEREMEMBER ROUTINE 

The Exec and Intuition ROM libraries provide two memo- 
ry-allocation schemes. The memory-allocation calls are very 
similar, but Exec's FreeMem() and Intuition's FreeRemem- 
ber() are different: FreeMem() frees one chunk of memory, 
while FreeRememberO frees potentially many chunks of 
memory. While Exec does provide routines that deal with 
memory management in a more sophisticated way (the Al- 
loc/FreeEntry() routines, and so on), Intuition's routines are 
more elegant. The standard Intuition library calls, however, 
do not provide for selectively freeing just one chunk of mem- 
ory. You can invoke FreeRememberO only for all the memo- 
ry-allocation chunks. 

In ordinary programming. Intuition's Alloc/FreeRemem- 
ber() routines are rare, because they provide no way to deal- 
locate one specific chunk of memory. The code below, Free- 
OneRememberO, is a routine that frees just one memory 
allocation from an Intuition Remember list. It is a high-level 
routine; cycle-counting is ignored and utility and code com- 
pactness are emphasized. The 680x0 CPU family is very 
adept at list handling. 

Essentially, FreeOneRemember(): 

• Finds the memory in a Intuition Remember list. 
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• Removes the Remember chunk from the list. 

• Creates a dummy Remember structure/pointer. 

• Does a FreeRememberQ of the dummy structure. 

• Deletes the dummy structure. 

This routine retains the FreeRememberQ advantage of be- 
ing able to free all memory allocations at once, which is 
handy when your application quits or aborts. With this ca- 
pability, you do not have to call many separate "free-some- 
memory" routines, and your code runs faster. The routine 
provides a new advantage in that you can deallocate just one 
chunk of memory, and still not have anything more than 
the standard Remember structure overhead. Also, when us- 
ing FreeOneRemember(), you need not keep track of the size 
of the memory chunk, which Exec's FreeMem() requires as 
an argument. You simply pass it one argument — the address 
of the memory. 

If you want to maintain separate Remember lists, you can 
customize FreeOneRemember() to require two arguments: a 
memory address and a Remember list address. 

FreeOneVariableQ is a slightly higher-level routine: You 
pass it the address of a LONGWORD variable that contains 
a pointer to the memory. A NULL pointer is fine. It's a pain- 
less way to ensure that a piece of memory is deallocated. 

FreeOneVarlable: ;AO=Address of variable to free, RETURNS aO 
; unmolested 

move.l (aO),dO ;address of memory to free 
clr.l (aO) ;(say it's gone...) 

FreeOneRemember: ;DO=Address of memory to free, 
tst.l dO ;address to free mem? 
beq.s finalend (1r ; none.. .gel outla here 
movem.l dQ/aO/a1/a6,-<sp) ;DESTROYS D1 

move.l #RememberKey,aO ;address of Remember list (CUSTOMIZE) 

f1 restart: ;TOP OF MAIN/SCANNING LOOP 
move.l a0,a1 ;a1=save prev for de-llnklng 
move.l (a0),d1 ;d1=rm NextRemember 
beq.s endof_f1r jnothing in list (why?...) 
move.l d1,a0 ;aO=next/eurrent 

cmp.l rm Memory(aO),dO ;this chunk aptr to our memory? 
bne.s flrestart ;nope...reloop till endalist 

;FOUND IT, FREE THE CHUNK 
flgotm: ;REMOVE FROM REMEMBER LIST 

move.l (a0),(a1) ;prev<==NEXT after me ;de-tlnk me...ao=me, a1=prev 
clr.l (aO) :rm NextRemember(aO) ;points to nOOne. now. 

;FREE THIS REMEMBER' STRUCT & ITS MEMORY CHUNK 

;USE/BUILD DUMMY REMEMBER KEY/LIST 
move.) aO,-(sp) ;temp pointer to a remember struct 
lea (sp).aO ;'pointcr to a remember pointer'... 
CALLIB Intuition, FreeRemember 
lea. I 4(sp),sp ; de-stack temporary remember struct 

endof f 1 r: 

movem.l (sp)+,dO/aO/a1/a6 
finalendjl r: 
its 

QUICKCOFY ROUTINE 
Many graphics programs need an undo buffer. Of course, 



on the slowest, unaccelerated generic machine, the Blitter is 
the fastest engine to use when copying a bitmap. When lots 
of fast RAM and a faster CPU are available, however, there 
are advantages to copying to an undo buffer that exists in fast 
RAM. This involves using the CPU, because the Blitter can 
work only in chip RAM. 

The Exec's memory-copy routine is fairly efficient, but 
faster routines are possible with a little bit of effort. Exec's 
routine uses the MOVEM trick, but it's in a very small, tight 
loop, A faster routine results from unrolling the innermost 
loop only far enough to still fit inside an 020's cache. Macros 
can help with the unrolling. In this case, a macro is used in 
the innermost loop. The macro expands to many instructions, 
but the code remains clear because the start and end of the 
loop are visible on an editor screen. (With the macro, the loop 
code is only a few lines long.) 

QUICKCopy: ;dO=count, aO=from address a1=to adr 
movem.l dO-d7/aO-aS,-(sp) 

cmp.L #48+1 ,d0 
bcs.s qclast 

move.l #384,-(sp) 

bra.s qccklp 
copyflrstloop: 

copy384 
qccklp: 

sub.L (sp),cO 

bcc.s copyflrstloop 

add.L (sp)+,dO 

A macro, copy384, copies 384 bytes using MOVEM.L in- 
structions. All registers are filled with copy data except for 
AO and Al, which contain the addresses, and DO, which con- 
tains the current copy count (-384). Note that the complete 
code is in the Purdon drawer on disk, the above is a fragment 
only. 

Using the Exec copy routine will work, but my custom 
routine will outperform it for large copies, which most graph- 
ics code seems to need. 

QUICKCopy requires at least WORD alignment of both 
source and destination addresses. It will perform better (on 
accelerated machines) if LONGWORD alignment is used. 
Byte-string copies, where alignment is not guaranteed, are 
best handled by another routine. 

To see how these routines can work together, study the 
demo program in the disk's Purdon drawer. Combining 
FreeOneRemember(), CHECK4MSG, QUICKCopy, and 
QuickPlot, the demo plots a bouncing ball on the top half of 
the screen, and copies the top half to the bottom half after 
each new pixel is plotted. 

After you put in the extra effort to write efficient assembly 
routines, consider making your routines available for other 
programmers to use. Package the code into a library, or de- 
vice or at least make it accessible from ARexx, Your fellow 
assembly programmers (like me) definitely will thank you for 
the help. ■ 

Jamie Purdon is the author ofNeivTek's DigiPaint and Toaster- 
Paint software. He's made a career of the Amiga, programming in 
68000 assembly language for over five years. Contact him c/o The 
AmigaWorld Tech Journal, 80 Elm St., Peterborough, NH 
03458, or on BIX (jamiep). 
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Custom Interfaces 
With ARexx 



By Marvin Weinstein 



AREXX, AN INTERPRETED language, is not well suited 
to writing computationally intensive programs. It is, howev- 
er, perfect for creating interfaces to programs that are not 
particularly user friendly. Implementing a user-friendly, ful- 
ly Amiga-style front-end to a program requires access to the 
graphical user interface (GUI) not provided by the standard 
ARexx support libraries that come with OS 2.0, 

Fortunately, you can obtain a number of ARexx-shared li- 
braries and manipulate the Amiga's GUI from within an ARexx 
program. Some of these libraries are part of commercially sup- 
ported packages; others are freely distributable. In a previous 
article (see "Extending ARexx," p. 18, Oct. '91), I explained how 
to use the automatic requesters in Willy Langeveld's freely dis- 
tributable rexxarplib.library to interact with the user. 

While automatic requesters are simple to set up and can 
provide wonderful results, designing a really nice interface 
requires more flexibility than the requesters provide. If you 
want to go the extra mile, rexxarplib provides tools for cre- 
ating an interface from the ground up. This article designs 
such an interface for the archiving program, Lharc — a perfect 
example of a useful program that needs a simpler interface. 

Using Lharc sporadically, combined with my terrible mem- 
ory, forces me to continually reread the help information and 
need several attempts to get the correct preface symbols, in 
the correct case and in the correct order. My solution had 
been to limit myself to extracting the contents of an archive 
and ignoring Lharc's other capabilities, but what I really 
wanted was an easy-to-use interface to Lharc. With the more 
advanced features of rexxarplib I created such an interface. 
(If you like my interface, then you get not only a lesson in 
ARexx programming, but also a useful utility. If you dislike 
my interface, then by the end of this article you will have all 
the tools required to rebuild it to fit your needs.) 

DESIGN REQUIREMENTS 

I decided that the Lharc interface must be a gadget-laden 
panel that opens on Workbench. It would have to provide a 
way to set the archive name by means of a file requester or a 
string gadget. Similar methods would be employed to define 
the destination directory for unpacking an archive and to set 
the search pattern used in the process. There would be a col- 
umn of buttons to set Lharc's switches, and these would tog- 
gle on and off to indicate their current state. In addition, there 
would have to be a second column of buttons to execute 
Lharc commands. 

ABOUT THE PROGRAM 

In my previous article, I presented a program called fast- 
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menu.rexx that creates a window endowed with button gad- 
gets. The program in this issue, rexxlharc.rexx, is a straight- 
forward, but more complicated, extension of the first one. 
Although rexxlharc.rexx uses more rexxarplib commands, 
complications arise because this program handles all mes- 
sages generated by clicking on button gadgets, whereas the 
gadgets used in fastmenu.rexx communicated directly with 
the REXX host process. 

This article will touch on the most important aspects of 
rexxlharc.rexx. It is meant to be read in conjunction with the 
listing in the accompanying disk's Weinstein drawer. The 
drawer also contains a copy of rexxarplib.doc, a complete, 
albeit telegraphic, description of the syntax of all the com- 
mands provided by rexxarplib.library. With the explana- 
tions given here and the comments included in the listing, 
the parts of rexxarplib.doc having to do with opening win- 
dows and endowing them with gadgets should be accessi- 
ble to you. 

INSTALLING THE PROGRAM 

To run this program, first copy the file rexxlharc.rexx into 
your rexx: directory. If you lack current versions of rexx- 
arplib.library and arp.library, copy them from the disk into 
your libs: directory. Finally, copy Lharc to your C: directory or 
elsewhere in your search path. Once this is done, you can type: 

rx rexx:rexxlharc 

If you do not wish to tie up a CLI, type: 

run rx rexx:rexxlharc 

Note that if you are running WShell, there is no need for 
the rx command. 

WHAT THE GADGETS DO 

Once the program is running, it can be shut down by click- 
ing on the close gadget in the upper-left corner of the win- 
dow. The gadget labeled Archive Name: causes a file re- 
quester to open. If you want to create a new archive, use this 
requester to define both its path and filename. You can omit 
the ,lzh at the end of the archive's name — the program will 
append it. You can, of course, avoid the file requester and 
type the hill name of the archive directly into the adjacent 
string gadget. 

To select the directory where files will be placed when un- 
packing an archive, click on the gadget labeled Destination:. 
This will open another file requester. As with the Archive 
Name: gadget, you can ignore it and type the name of the des- 
tination directory directly into the adjacent string gadget. 



With the rexxarblib. library, and a little ingenuity, you can 
add a GUI to Lharc or any other program. 



~ 



(Note that the form of this requester will differ under Ami- 
gaDOS 1.3, where rexxarplib uses the ARP file requester, and 
2.0, where it uses the ASL file requester. See the listing for 
comments.) 

The gadgets that appear in the column labeled "Switches" 
are used to set Lharc options; highlighting indicates options 
currently in effect. Some of Lharc's switches require argu- 
ments. Clicking on the associated gadget brings up an auto- 
matic requester that lets you supply the required information. 
Once the desired switches have been set, you can click on a 
command to launch an instance of Lharc, Each click on a 
command gadget causes a new CLI window to open. All 
CLIs are fully interactive, so you can abort any instance of 
Lharc by activating the appropriate window and typing 
CTRL-C. 

HANDLING THE MESSAGE FORT 

Let's consider a skeletal form of rexxlharc.rexx to see how 
the program handles rexxarplib host-generated messages: 

/** rexx:rexxlharc.rexx - Version 1.0 * **/ 
calladdiib{ , 'rexxarplib.library",0,-30,0) 



quitflag = 
do forever 

if quitflag = 1 then leave 

t = waitpkt(LHARCPORT) 

r 

• This loop handles all currently queued 
' messages then goes to sleep again. 
V 
doff = 1 

p = getpkt(LHARCPORT) 
if c2d(p) = then leave ff 
command ■ getarg(p) 
t = reply(p,0) 
select 

when command = CLOSEWINDOW then do 
call CloseWindow(LHARCHOST) 
quitflag=1 
end 



when command 

Svdots 
end 



= LISTARCHIVE then do 



* Before doing anything open your message 

* port. If this fails exit cleanly 
V 

testport = openport(LHARCPQRT) 



otherwise nop 
end 
end 
end 
exit 



^ 



' Asynchronously launch a string ... 

*/ 

address AREXX , 

" 'call createhost(LHARCHOST,LHARCPORT)' 



call MakeWindow() 



" Everything is ready, 



SALIENT FEATURES 

The program begins by adding both rexxsupport.library 
and rexxarplib. library to ARexx's internal list (a process dis- 
cussed in the Oct. '91 article). The first important fragment of 
code is the call to ARexx's openportQ function, which creates 
a public message port with the name LHARCPORT, Here it 
will receive messages from its rexxarplib host. The code that 
follows checks to see if ARexx succeeded in opening the mes- 
sage port; if it failed to open, the program exits. 

If all went well, the next major step is to create the rexx- 
arplib host, LHARCHOST. A rexxarplib host is a separate 
process that knows how to open a window and endow it 
with menus and gadgets. The host monitors IDCMP mes- 
sages generated by clicking on gadgets attached to the win- 
dow. When a message arrives, it creates an ARexx message, > 
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fills its slots with the information specified in the call to Add- 
Gadget(), and then sends it off to LHARCPORT. 

Rexxarplib hosts are created with the createhostQ function, 
which accepts two arguments. The first specifies the name the 
host should use for its own public message port; the second 
is the name of the port to which it will forward the ARexx 
messages it constructs. Enclose the call to this function in two 
sets of quotes and preface it by the address AREXX instruc- 
tion (simply inserting it in the program at this point would 
halt execution). For historical reasons the call to createhost() 
does not return until the host closes down. To avoid this lock- 
up, send the call as a string program to 
AREXX (thus automatically running it 
as an asynchronous process). Two sets 
of quotes are needed: The REXX inter- 
preter eats the first set in parsing the 
line, and the second set lets AREXX rec- 



To guarantee that all messages 



x = getarg(p.n) 

The first argument must be the address of the message ob- 
tained from the call to getpktQ; the second argument is an in- 
teger that can run from 1 to 15. Using x = getarg(p) without 
a second argument is equivalent to x = getarg(p,0). 

To understand the meaning of the second argument, you 
have to know that an ARexx message has 16 slots that can 
contain independent strings. When you use the AddGad- 
get<) function, you tell LHARCHOST the information to be 
loaded into the ARexx message it will send to LHARC- 
PORT. In fact, you can specify that different information be 
placed in each of the 16 available 
slots in the ARexx message. 

As you will see when we examine 
the subroutine MakeWindowQ, the 
message associated with a button gad- 
get contains the name of the gadget in 



ognize the string as an in-line ARexx are } lanc U ec j \ n a timeki fashion its zero slot; a11 other slots are empty, 
program. In the case of a string gadget, the 



Once LHARCHOST exists, you can 
tell it to open a window and then adom 
it with various gadgets. This is done 
through the subroutine MakeWindow() 
(discussed below). For now let's just say 
that gadgets can be given individual 
names, and you can specify that an 
ARexx message containing such a name 
be sent to LHARCPORT whenever the 
user clicks on the gadget. 

The technique used to handle mes- 
sages that arrive at LHARCPORT is no 
different from that used in a C pro- 
gram. Begin with an outer loop that runs until you explicit- 
ly break out of it by way of the leave instruction. At the top 
of the loop, call the ARexx waifpkt() function to go to sleep 
until a message arrives at LHARCPORT. If a message is al- 
ready waiting in the message port, then the call to waitpktQ 
returns immediately; otherwise, it does not return until a 
message arrives. When a message does arrive, waitpkt() re- 
turns and control transfers to the inner (do ff = 1) loop. Here 
the work of handling the message is accomplished. 

The strategy in this section of the program is to keep 
pulling messages from LHARCPORT until no more mes- 
sages are pending. This is possible because the getpktQ func- 
tion lets you poll the port. If a message is queued at the port, 
then getpktQ returns its address. If no messages are waiting, 
it returns the string OOOOOOOOx. One way to check for a mes- 
sage is to use the ARexx function c2d() to convert this string 
to a decimal number and compare it to zero. If the c2d(p) is 
zero, no message exists, so you leave the ff loop. Each time 
you leave ff, check the value of quitflag to see if it is time to 
shut down. If quitflag is zero, call waitpkt() to send the pro- 
gram back to sleep until a new message arrives. 

It is important to understand just what is meant by a "new 
message." Each time you call getpkt(), all messages already 
queued at the port become defined as old, and a subsequent 
call to waitpktQ ignores the existence of old messages. There- 
fore, in order to guarantee that all messages are handled in a 
timely fashion, you must clear out queued messages before 
going back to sleep. 

Assuming that c2d(p) is not zero, use the getarg(p) func- 
tion to extract the contents of the message. The general syn- 
tax of getargQ is: 



you must 

clear out queued messages 

before going back 

to sleep." 



take. 



name of the gadget appears in the 
zero slot, and the first slot contains the 
current contents of the gadget. A but- 
ton gadget sends a message to LHAR- 
CPORT whenever it is clicked, where- 
as a string gadget returns a message 
only when a carriage return is hit af- 
ter the gadget has been activated for 
input. As the skeletal listing above 
shows, the value of the variable com- 
mand — the string contained in the 
zero slot of the message — is used to 
determine what subsequent actions to 



Note that after a message is received using the getargO 
command, it must answered using the reply(packet,rc) func- 
tion, where packet is the address of the message returned by 
the call to getpkt(), and re is a return code, which in this case 
should always be zero. 

Failure to faithfully reply to all packets received can result 
in disaster. Examination of the skeletal listing shows that 
when a CLOSEWPNDOW command is received — when the 
user clicks on the window's close gadget — you're perform- 
ing two operations only. First, you're telling LHARCHOST 
to close its window by calling CloseWindow(LHARCHOST). 
Second, you're setting the value of quitflag to 1. This allows 
the ff=l loop to continue running until all waiting messages 
have been replied to. 

Note that the full listing does not contain any code to han- 
dle commands corresponding to the names of the string gad- 
gets. This is because you read these gadgets at the last mo- 
ment, before executing a Lharc command, to avoid problems 
that would result if the user didn't type a carriage return 
when he finished typing the string. Some sections of the se- 
lect construct handle commands that require the program to 
get additional information from the user. To do this, you uti- 
lize calls to the postmsgQ, requestQ, and getfileQ functions. 
(The first two functions were covered in my last article; how- 
ever, the getfileO function is new. See the comments in the full 
listing for an explanation of its syntax.) 

OPENING THE WINDOW 

Opening a window and adorning it with gadgets requires 
only the OpenWindowQ and AddGadgetQ functions. In this 
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program, you use two slightly different forms of the call to 
AddGadget(), namely: 

call AddGadget(hostname,x,y,gadgetname,gadget text,"%d M ) 

and 

call AddGadget 

(hostname,x,y,gadgetname,gadget text,"%d%1%g ".length) 

The first form of the call to AddGadget() is familiar from 
fastmenu.rexx, but the second is new. In both calls the vari- 
ables hostname, x, y, gadgername, and gadget text specify 
the rexxarplib host to which the com- 
mand will be sent, the gadget's loca- 
tion in its window, the name to be as- 
sociated with the gadget, and the 
default text to appear in the gadget. 

In the second call, the %d%l%g 
string is new. It says that whenever the 
gadget is activated, the rexxarplib host 
should forward an ARexx message 
that contains the name of the gadget in 
its zero slot and the contents of the 
string gadget in its first slot. (See the 
full listing and rexxarplib.doc for a 
complete specification of the way in 
which report strings can be construct- 
ed.) 

The last argument, an integer, serves 
a dual purpose. First, its existence tells 
the rexxarplib host that this is to be a 
string gadget and not a button gadget. 
Second, it specifies the length of the 
string gadget. You will also find calls to the SetReqColor(), 
WindowText(), and SetGadget() functions in this program. 
These commands were not discussed before, and there are 
some points concerning their usage that should be made here. 

NEW COMMANDS 

SetReqColor(host,pen type,color) tells LHARCHOST 
which colors to use when rendering gadgets, borders, menus, 
and so on. Possible pen types are BLOCK, SHADOW, DE- 
TAIL, BACKGROUND, PROMPT, BOX, OKAY, and CAN- 
CEL. For a four-color Workbench screen, the possible color 
numbers are 0, 1, 2, and 3. Using SetReqColorQ to change 
these pens can result in interesting effects. In general, how- 
ever, rexxarplib chooses default values that work well under 
AmigaDOS 1.3 and with the new 3-D look of OS 2.0. For this 
reason you only reset the BACKGROUND pen. This must be 
done because you want to produce a window that is back- 
filled with a color other than zero. 

Note that the rexxarplib host must know all of the requester 
colors before it renders a window, and the call to SetReq- 
ColorQ must precede the call to OpenWindow(). 

There are two techniques for having a rexxarplib host ren- 
der text in a window. The most flexible method is to use the 
command Move(hostname,x,y) to define the position at 
which the text is to begin and Text(hostname,string) to cause 
the text to be rendered. This method, however, can be slow 
and unnecessarily complex for our purposes. A simpler 
method is to use a call to the WtndowText() function. The 
syntax of this call is: 

call WindowText(hostname,string) 



WindowTextQ begins rendering the string at the top of the 
window and continues until it is done. It renders the text ex- 
actly as it appears in the string, including spaces, except that 
it recognizes a back slash (\) as the instruction to begin a 
new line. Since the entire string is rendered at once, the call 
to WindowText() is quite fast. The listing shows that it is fair- 
ly easy to construct a string that places text exactly where you 
want it. 

REFRESHING STRING GADGETS 

Now I must comment upon the technique you are forced 
to use to refresh the contents of a 
string gadget. Unfortunately, it is cir- 
cuitous. There is no rexxarplib func- 
tion that lets you change the contents 
of a string gadget once it has been 
added to a window. To change the 
contents of a string gadget, you must 
first remove the gadget and then add 
it back. For example, when the user 
fills in the archive name in the file re- 
quester and clicks on OKAY, the 
string gadget called ARCHIVE- 
CXCept that it recognizes a back- NAME is updated. This is accom- 
plished by successive calls to Re- 



'WindowTextO renders text 

exactly as it appears in the 

string, including spaces, 



slash as the instruction 



to begin a new line 



moveGadget() and Add-Gadgef(). 
Note that, while the call to Remove- 
GadgetQ tells LHARCHOST to re- 
move the gadget from its list, it does 
not remove the gadget imagery from 
the window. If necessary, this must 
be done separately, but since you are 

rendering exactly the same gadget image as before, you 

can skip this step. 

READING STRING GADGETS 

Finally, I would like to comment on the subroutine Get- 
Var(), which is called by GetStrings(). This is a general pur- 
pose routine that I use in many programs to find the current 
contents of a string gadget. The routine begins with a call to 
the ReadGadget(host,gadgername) function, which tells the 
rexxarplibhost to read the contents of the indicated string 
gadget, put those contents into an ARexx message, and send 
that message back to LHARCPORT. Since you want to pro- 
cess this message within the subroutine, simply call wait- 
pkt() immediately after issuing the call to ReadGadget{) and 
process the message when it arrives. As always, check that 
you have gotten a valid packet address before calling get- 
pkt(); then reply() to the message. 

IN CLOSING 

I hope that my articles have served as a useful introduction 
to some of the possibilities open to you through rexxarplib. li- 
brary. If you read rexxarplib.doc, you will see that we have 
only scratched the surface. Things become especially inter- 
esting when you start using these tools in conjunction with 
commercial applications. Give them a try. ■ 



Marvin Weinstein uses ARexx and REXX extensively in his 
work at the Stanford Linear Accelerator. Write to him c/o The 
AmigaWorld Tech Journal, 80 Elm St., Peterborough, NH 
03458, or contact him on BIX (tmveinstein). 
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The Quest for Speed 

Computer System Associates and RCS Management have 
joined GVP and Progressive Peripherals & Software in the 
'040 board race. Besides a 25 MHz 68040, the 40/4 Magnum 
from CSA (7564 Trade St., San Diego, CA 92121, 619/566- 
3911) boasts one to 64 megabytes of 32-bit RAM, a DMA SCSI 
controller, a parallel port, two serial ports, a 32-bit expansion 



bus, and room to mount a hard drive. The Fusion-Forty from 
RCS Management (120 McGill St., Montreal, Quebec, Cana- 
da H2Y 2E5, 514/871-4924) promises a 25 MHz 68040, four 
to 32 megabytes of 32-bit RAM, a hardware switch that lets 
you return to your original processor, asynchronous design, 
and OS 1.3 compatibility. 



Faster 500s 



Motor Control 

Sporting programmable 
acceleration and decelera- 
tion, the MCB-4 Stepping 
Motor Controller/Driver 
Board can simultaneously 
control four four-phase Step- 
ping motors at two amps per 
phase. The board connects to 
your Amiga via the RS-232 



port and offers stepping rates 
of up to 10,000 steps per sec- 
ond and 16.7 million steps 
per move. It has opto-isolat- 
ed home and four limit in- 
puts; plus the power section 
is opto-isolated from the con- 
trol section in an attempt to 
reduce noise. Nonvolatile 



memory for motion control 
variables and an end-of-mo- 
tion indicator round out the 
package. MCB-4 sells for 
$695 and is available from 
Advanced Control Systems 
Corp., Old Mine Rock Way, 
Hingham, MA 02043, 
617/740-0223. 




The MCB-4 board controls up to lour stepping motors simultaneously. 



Microbotics recently announced a new A500 68030 accel- 
erator—the VXL-30 (25 MHz, S399; 40 MHz, $629). Based on 
the 680EC30 chip (which is identical to the 68030, except that 
the Programmed Memory Management Unit is not avail- 
able), it allows A500 owners to add an internal 32-bit accel- 
erator to their systems. No one need feel left out; A2000 own- 
ers can also use the VXL. To install the VXL, you plug it in 
the 68000 CPU socket and move the native 68000 to a socket 
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In addition to source and execu ta- 
bles for the article examples, you 

will find: 



CATS' Debugging Tools— Enforcer, 

Mungwall, debug.lib, ddebug.lib 
DiskSpeed 4.0— How fast is your hard drive? 

Libraries & Custom Printer Drivers 



^ 



This nonbootable disk is divided into two main directories, 
Articles and Applications. Articles is organized into subdirec- 
tories containing source and executable for all routines and 
programs discussed in this issue's articles. Rather than con- 
dense article titles into cryptic icon names, we named the 
subdirectories after (heir associated authors. So, if you want 
the listing for "101 Methods of Bubble Sorting in BASIC," by 
Chuck Nicholas, just look for Nicholas, not 101MOBSIB. The 
remainder of the disk, Applications, is composed of direc- 
tories containing various programs we thought you'd find 
helpful. Keep your copies of Arc, Lharc, and Zoo handy; 
space constraints may have forced us to compress a few files. 



Unless otherwise noted in their documentation, the sup- 
plied files are freely distributable. Read the fine print care- 
fully, and do not under any circumstances resell them. Do be 
polite and appreciative: Send the authors shareware contri- 
butions if they request it and you like their programs. 

Before you rush to your Amiga and pop your disk in, make 
a copy and store the original in a safe place. Listings provid- 
ed on-disk are a boon until the disk gets corrupted. Please take 
a minute now to save yourself hours of frustration later. 

If your disk is defective, return to AmigaWorld Tech Jour- 
nal Disk, Special Products, 80 Elm St., Peterborough, NH 
03458 for a replacement. 
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on the VXL board. This allows you to reboot into 68000 mode 
when necessary. Additional options for the VXL include a 
RAM board that holds two or eight megabytes of fast-page 
mode RAM ($379) and a 68881 or 68882 math coprocessor 
that can be clocked at speeds up to 60 MHz ($200). For those 
users who want a PMMU, 68030 versions of the board are 
available on request. Simply contact Microborics Inc., 1251 
American Parkway, Richardson, TX 75081, 214/437-5330. 



In Charge On-Line 



A collection of interrelated 
program modules, DLG Pro- 
fessional ($199) takes a new 
approach to bulletin-board 
operating systems. Because 
DLG is built on the Shell and 
is ARexx compatible, you 
can incorporate CLI- and 
ARexx-based programs into 
DLG's bulletin-board setup. 



DLG Professional promises 

to support 65,000 users, 255 
user levels, 9999 message ar- 
eas, 9999 file areas, multiple 
lines, and conferencing. DLG 
builds on the standard list of 
BBS features with the likes of 
message broadcasting, tag- 
ging, bundling, and down- 
loading; off-line reading, and 



sysop-configurable file- 
transfer protocols. Plus, it is 
compatible with FidoNet 
electronic mail and echomail 
conferencing protocols, as 
well as UseNet. For complete 
details, contact TelePro, 20- 
1524 Rayner Ave., Saska- 
toon, Sask., Canada S7N 1 Yl, 
306/665-3811. 



Add a CD-ROM System 

Need CD-ROM drive control for your project? Consider 
CDROM-FS ($50 Canadian), a ISO-9660 and HiSierra file 
system. The Developers Toolkit consists of the cdrom.library 
of support functions, header and include files, sample source 
code for calling cdrom.library functions, two stand-alone util- 
ity programs, linking library modules for Manx and SAS C, 
AutoDocs, and sample mountlists. Some of the new features 



in this release are support for AmigaDOS 2,0 packets, ex- 
tended attribute records for files and directories, Chinon 
CDA/CDS/CDX-431 drive support, CDDACtrl audio con- 
trol, timecode displays for CDDACtrls, and optional audio 
notification at end of track for CDDACtrls. For a thorough de- 
scription, contact Canadian Prototype Replicas, PO Box 8, 
Breslau, Ontario, Canada NOB 1M0, 519/884-4412. 



A New Type 

Compatible with all Ami- 
gas, the KB-Talker ($69.95) 
is a universal keyboard 
adapter that allows you to 
connect any PC/ AT-compat- 
ible keyboard to your Ami- 
ga. Completely transparent, 
it does not require you to 



make any software changes 
or additions. A500 owners, 
however, need a special 
adapter cable and must 
make some slight modifica- 
tion to the machine's case. 
KB-Talker also supports a 
dual keymapping configura- 



tion that can be toggled on 
the fly, making it a standard 
PC keyboard for use with 
Amiga Bridgeboards. Direct 
your inquires to Co-Tronics 
Engineering, PO Box 5146, 
Glendale, AZ 85312-5146, 
314/429-2644. 



Graphic Improvements 

New from GfxBase is the GDA-1 (512K, $495; 1024K, $649), 
a Zorro II 16-bit graphics card with an eight-bit display. Res- 
olutions range from as low as 640x480 noninterlaced with 256 
colors (out of a palette of 16.7 million) to, for the one megabyte 
version, 800x600 or 1024x768, each with 256 colors. The dis- 
play architecture uses chunky pixels, not bitplanes. The mem- 
ory is contiguous and the card can double as a standard mem- 
ory card. Current software support includes XWindows for 
AmigaDOS, and it comes with some basic device drivers, 
source code, and utilities. A software emulation of the GDA- 
1 is also available for developers who want to support the 
card. Current plans call for a December 1991 shipping date. 
GfxBase Inc. welcomes developer inquires at 1881 Ellwell Dr., 
Milpitas, CA 95035, 408/262-1469. 



What's on the Schedule? 



If you or your company 
has a hot new product on the 
way, tell us about it. We'll 
tell the readers. Please send 
your press releases and an- 



nouncements to The Amiga- 
World Tech Journal, 80 Elm 
St., Peterborough, NH 03458, 
or llaflamme on BIX. ■ 
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COLORBURST 



24/48 BIT GRAPHICS ENGINE 

The only true 24-Bit video display 

for all Amigas 

• Each and every pixel can be any of 16.8 Million 
Colors 

• Pure, Broadcast Quality RGB Output 

• Realtime Image Processing 

• Includes 24-Bit Paint Program 

• Connects thru Monitor Port to All Amigas' 

• Compatible with all Amiga Monitors 

• High Resolution 768 x 480 pixels (580 PAL) 

• Includes 1 .5 MB display RAM 

• Realtime Horizontal and Vertical Scrolling 

• Complex Color Cycling and Video Effects 

• All at an Affordable Price 




MINIMEGS RAM EXPANSION 

Affordable 2 MB External RAM 

Available for Amiga 500 and 1000 

Fully AUTOCONFtE, 100% true FAST RAM 

Compact, tow prollia design 

Low power consumption 



Enhanced Unidrive 




The Ultimate Drive lor your Amiga: 

cacungi wot rroocuon twncn 
Low Profile! 
Floppy drive port pass-thru protect switch 
Digital LED track display On/Off ■ 



FLASH SCSI CONTROLLER 

JPWtni Futtfl Canmiar AvaUMel 
- m[' Vll-g ftwaMHem 8-eti Ktum 
-^1W*JP*C OtaZimlorltloL 

1 HBZPSodutllxiaer-tataijOteRAMijqlobl* 

tmmt tmmum Ugl m Hoc* ttrnrtm 

S«ts4S(IUiKlnaikpnl« 

10-BH ■ >a50K/nc tramtar ran 



8- or 16-Bit NON-DMA 

SCSI Controller for the 

Amiga 2000 Series 









I.A.D. 

The Ultimate 
Disk Optimizer 




mctf Mindlink 




The hottest, newest, 
super-flowerfii 



• Mite yiir dltit tly win 
B.I.O.I 

■ tun diit iecbii lima ii ■■ It 500% 

■ Hipti arllD IIbbiIsb Am Urn drlvn 

• tapplpl top Vlrtial Mimpr ill ■Bltlpli 

IIPlltllBB 

• IflcpilEBll WlPBbBBCh flfld CLI pBPjQpminca 
> AmlgiDCS 2.0 conpatJblB 

• Worn Willi Mb Flit FIlB Syilam 

■ TBI MGSI Bgpulip Affllgl lillltty over 



PIK0UND 




The Musical 
Graphics Player 

Tr-anslorms Amiga 
graphics into music. Load 
any graphic Image (or 
create one using Plxoued's dynamic built-in 
screen generators), Uien hear It translated 
Into a rich variety el harmonies and melo- 
dies using tho Amiga's voices, a MIDI key- 
board, or both. You've never seen or heard 
anything like PIXOUND before. 




BLITZ BASIC ™e Amiga's fastest BASK 

The ease of BASK wttn the speed ol Assemiity code - fompled 

and exeaitaMe output -Full support lor Bfits, Sprites, F 

Screens (including HAM), Sampled Sounds, DoufiiWufferetl 

anknatjon and much more! 



My Paint 



Everybody lovet My 
Paint. Designed tor 
kids but fun for ev- 
eryone. Includes an 
animated-Icon In- 
terlace, drawing tools, special effects, 
multiple palettes, digitized sound ef- 
fects, ZS pictures to color In and much 
morel Additional Coloring Book disks 
available. CDTV Version also avalable 




DCTV: A Guided Tour 

Tan nsy-to-folow, autntaSn VBJ 
Utoral wl tr^raiwytHngyaa mt k 
toitalirjviroeta 



bcm 



Imagine: A Guided Tour 

This extensive video tutorial Includes 

segments on object loading and creation, 

I surface attributes, lighting techniques, 

\ texture mapping, animation, 12 and / 

24-bit comparisons and more. A 

1 mist-have II you're serious ' 

k about unleashing your t 

imagination with Imagine. 




Professional Techniques 
for Deluxe Paint III 



Tie lot 
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Contour Software 
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Amiga User Interface 
Style Guide 

Rules for the 2.0 look and feel. 

By Dan Weiss 

"AT LAST" WAS the first thought 
that ran through my mind when I saw 
the Amiga User Interface Style Guide, 
written by Dan Baker, Mark Green, 
and David Junod of CATS (with the 
help of many prominent contributors). 
At last, developers have a definitive 
guide to help them make some sense 
of what a "true" Amiga application 
should look and behave like. 

A user-interface guide is not unique 
to the Amiga. Most commercially 
available GUIs (graphical user inter- 
faces) have some sort of interface or 
style guide. The Amiga has been slow 
in gaining one (there is an eight-page 
section in the Amiga ROM Kernel Refer- 
ence Manual: Libraries & Devices, but it 
is vague at best). An interface guide 
should set a standard for the look and 
feel of programs that run under the 
described GUI. Now, in conjunction 
with the new user interface 
introduced in Release 2 of the operat- 
ing system, Commodore has 
produced such a guide. For the first 
time, Amiga programmers have a 
tangible guideline to follow. 

The Amiga User Interface Style Guide 
provides clear explanations suitable 
for all readers, from the least to the 
most experienced. Its 200-plus pages 
cover each facet of the user interface in 
detail — not only what a button should 
look like, but also how it should be- 
have under a number of circum- 
stances. This depth sets the book apart 
from similar publications. It is not 
simply a description of the Amiga 
interface, but a unified, exhaustive 
examination of all its parts. If you 
want your product to have the Release 
2 look, this is where it is defined. 

FOR DESIGNERS AND 
PROGRAMMERS 

For the designer creating a new 
program, the guide lays out a tapestry 
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of ideas. It is important, though, to 
remember that the book is only a 
guide. If you need or want to do some- 
thing different, no one will stop you. 
Of course, if your approach is really 
nonstandard, no one may buy your 
product. Close-up pictures (to the pixel 
level) of key graphical components 
allow you to match exactly many 
facets of the interface that should be 




common to all programs (such as the 
wait cursor). The guide also covers 
issues of layout and design as they 
apply to screens and requesters. With 
this information, you can make sure 
that your program looks like a 2.0 pro- 
gram, but there is more to it than that. 

The book also helps programmers 
make sure their software has the prop- 
erfcel. Before you implement a re- 
quester, for example, you should 
check Chapter Four: Windows and 
Requesters to see if your requester 
will act the same way as the standard. 
Is it draggable? Does it offer the user a 
safe way out? Or, if you are putting 
up a requester to notify the user that 
you were unable load a module or 
library, are you doing it the best way? 

Issues of proper implementation are 
very important in the guide. For ex- 
ample, when using the newly created 
cycle gadgets, you must be sure to 
support standard keyboard shortcuts 
and determine whether they are modi- 



fied by the shift key. All major appli- 
cation gadgets, as well as system 
gadgets, are covered in some detail in 
Chapter Five: Gadgets. 

The guide changes direction with 
Chapter Eight: The Shell. It addresses 
the key issues for the parts of the 
Amiga that are not graphic related. 
The Shell (8), ARexx (9), and Prefer- 
ences (12) each has a separate chapter. 
These, along with The Keyboard (10) 
and Data Sharing (11), make up the 
remainder of the instructional part of 
the book. I would subtitle this section 
"The Amiga Way." 

While the first seven chapters cover 
material that is more or less common 
to all GUIs, the final five attempt to 
extol the virtues of that which is 
uniquely Amiga. The chapter on 
ARexx goes farther than any docu- 
ment I have seen in defining a stan- 
dard set of commands. To date, 
ARexx support has often been a case 
of "catch as catch can." The other 
chapters in this section also offer in- 
sight into what Commodore would 
prefer an Amiga product to be like. 
Chapter 1 2: Preferences is perhaps the 
best in the book at explaining why 
you should follow the suggested 
method. 

TELL ME MORE 

Which brings me to the major fail- 
ing of the book — justification. For the 
average reader, being told that long 
menus are bad is enough. But why are 
they bad? That's a very important 
issue. Much research has gone into the 
design of GUIs, as alluded to in the 
first chapter, but never referenced. 
While a fine two-page list of Com- 
modore addresses around the world 
was provided, no bibliography of 
further or supporting texts was in- 
cluded. Seems to me that would have 
filled the 12 blank pages at the end of 
the book nicely. To make up for this 
lack, I have included a short list of 
supplemental reading suggestions on 
the following page. 

As for the design of the book itself, it 
is the finest of all the Addison-Wesley 
Amiga technical books to date. The *- 



cover is eye-catching, but not annoy- 
ing. The layout has an open quality, 
and is set in comfortable, eye-pleasing, 
type. Each page has a wide outside 
margin (great for jotting notes) that 
sometimes contains hints or short 
summations, which make scanning for 
a given section very easy. Including 
the appropriate chapter number in 
each page's folio, however, would 
have made the book a little easier to 
navigate. The glossary and the index 
are both excellent, although the index 
could have been in larger type. 

On the whole, I like the guide. Not 
only does it provide some much need- 
ed information, but it is also a plea- 
sure to use. When you get your copy, 
sit down and read it cover to cover. I 
did so pool-side, and finished it in no 
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The Human Factor 

By Richard Rubinstein and 


The Art of Human Computer Interface 
Design 


Harry Hersh 


Edited by Brenda Laurel 


Digital Press 
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Designing the User Interface 
By Ben Shneiderman 
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in that authors range from Alan Kay 
to Timothy Leary, r 
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time at all. In the process I found 
myself taking notes, because 1 had 
found answers to several nagging 
questions. If you are genuinely inter- 
ested in user-interface design, I sug- 
gest you also rend the supplemental 
books listed in the box above. Interest- 
ing to note is one line of fine print on 
the copyright page: "As with all soft- 
ware upgrades, full compatibility, 



although a goal, cannot be guaran- 
teed, and is in fact unlikely." Oh well, 
so much for the rule book. ■ 

Amiga User Interface Style Guide 
Addison-Wesley Publishing Company 

Reading, MA 01867 
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Ever since the Amiga* was introduced, the Lattice* C Compiler has been the compiler of choice. 
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"Pure" Tricks with SAS/C 



By Michael Weiblen 



USERS APPRECIATE THE flexibility pure programs offer. 
By using the RESIDENT command, they have better control 
over integrating your program into their environment. This 
article will discuss the advantages of making pure programs 
resident, which factors determine a program's purity, and a 
convenient method for creating pure programs using the 
SAS/C compiler. 

PURITY AND RESIDENT 

When a user makes a program resident, an image of the 
program is loaded from disk and added to the Shell's resident 
list. Later, when the user invokes that program, the Shell first 
searches to see if the program is on the resident list. If it is, 
the Shell can skip the time-consuming process of loading the 
program from disk, and simply begins executing the pro- 
gram image already in memory. When the program finishes 
executing, the program image remains in memory on the res- 
ident list for future reuse. 

Contrast this with the invocation of a nonresident pro- 
gram: The program image is loaded from disk, the program 
executes, and the image is flushed from memory. Clearly, by 
eliminating the load from disk, a resident program can exe- 
cute much faster. 

The important point here is that every invocation of a res- 
ident program will use the same in-memory image. It is vi- 
tal that no invocation do anything to damage that image. 
This characteristic is called being serially reusable. Further- 
more, given the multitasking capabilities of the Amiga, it is 
quite possible that a user might want to run several invoca- 
tions of a resident program simultaneously, so it is equally 
vital that concurrent invocations not interfere with each oth- 
er. This is known as being re-entrant. Only programs that are 
both serially reusable and re-entrant can safely be made res- 
ident. Such programs are said to be "pure." Unfortunately, 
purity doesn't just happen all by itself. 

What, then, is necessary to make a program pure? As- 
suming that the program is not self-modifying (which is a can 
of worms I'm going to avoid entirely), the factor that distin- 
guishes invocations of a program is its data. Therefore, a pure 
program must ensure that the data areas of each invocation 
do not interfere with each other. 

WRITING PURE C 

In C, data storage can be separated into two classes, local 
variables and global variables. Local variables, without the 
static keyword, are allocated by a C function from the task's 
stack. Because each invocation's task is allocated a separate 
stack area by the operating system, you can be sure that any 
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data stored on the stack will be private to that task. Local vari- 
ables, then, cause no problems with a program's purity. 

The problem is with how C allocates a program's global 
variables. Declared outside a function or with the static key- 
word, global variables are stored as part of the program im- 
age, the same image that must be reused and shared among 
all invocations when the program is made resident. Obvi- 
ously, a pure program cannot modify its global variables. 
But what are variables for, except to be modified? The issue 
of purity, then, boils down to how a program resolves this 
conflict regarding global variables. 

The brute-force method of writing a pure program in C is 
to avoid using global variables altogether. Each function al- 
locates all its variables locally, either on the stack or from 
system memory; functions that need to share variables al- 
ways have to pass around pointers to those variables. Writ- 
ing a C program using this method is not difficult, but does 
require a special effort from the beginning. Converting an ex- 
isting C program — one already using global variables — to 
this method can be a tedious and cumbersome task. 

SAS/C provides an alternative, one that places fewer re- 
strictions on your coding style and is much easier to use when 
making an existing program pure. All the grunt work of this 
method is handled for you in a special start-up module, 
cres.o, which you link into your program instead of the nor- 
mal start-up module c.o. Many programs can be made pure 
simply by relinking with cres.o. Other programs, specifical- 
ly those using some of the more powerful features of the 
Amiga and version 2.0 of the operating system, may need mi- 
nor modifications. 

Before we look at how cres.o works and what modifica- 
tions may be necessary to use it, I must say a word about 
when not to make programs pure. It's tempting to make ev- 
ery program pure, because that gives the user the freedom to 
make them all resident to improve their performance. If users 
would never want to make a program resident, however, do 
not make it pure. A pure version of a program will general- 
ly require a little more memory for a solitary invocation than 
an impure version; it's during the subsequent or simultane- 
ous invocations that the advantages of a resident program 
emerge. On the flip side, those naive people who try to make 
everything resident will be quite frustrated when impure 
programs crash; consider your target users carefully. 

CRES.O AT WORK 

Cres.o relies on the base-relative method of accessing glob- 
al variables. Base-relative addressing requires all the global 
data to be merged into a single data hunk (called near data 






Follow these guidelines to make your code safely 
re-en terant and serially reusable. 
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in SAS/C's manuals) in the executable file. (This technique 
is called base-relative because variables are accessed using a 
16-bit offset relative to the base address of the data hunk, 
which is stored in register A4.) The converse of base-relative 
addressing is absolute (far) addressing, which uses a 32-bit 
absolute pointer to the variable. While it is quite legal to ere 
ate a program that uses a combination of absolute and base- 
relative access to data, this is not usually permitted in a pure 
program (you may use absolute references to data only if the 
data is treated as read-only), Cres.o demands that only base- 
relative addressing be used to access global variables. When 
you link your program with cres.o, BLINK will display warn- 
ing messages for any absolute references it detects. Your pro- 
gram is not pure unless it links without warnings. 

Now let's follow cres.o in action to see how it makes a pro- 
gram pure. When the program is invoked, cres.o allocates a 
block of memory and initializes it to to account for the BS5 
and copies only as much of the data portion of the program 
image as was initialized at compile time (which is useful in 
reducing the size of the executable image). For example: 

init I = 1 ; /• This data would be copied. V 

char array [1000]; /*This data Is not copied because It is BSS. V 

It then loads the address of that private data area into reg- 
ister A4 and begins executing your functions. Effectively, ev- 
ery invocation starts with a fresh copy of the global data area, 
which it accesses relative to its private A4base address. Those 
"global" variables are then actually global to the invocation, 
not the image. When your functions finish, control returns to 
cres.o, which deallocates the data area and exits to the OS. 

The importance of avoiding absolute references to global 
variables should now be apparent. Where a data area is lo- 
cated depends on which invocation it belongs to. Absolute 
references, by their very nature, are not capable of this ad- 
justment. {In fact, absolute references really access the data 
portion of the resident image. We definitely wouldn't want 
to disturb that!) The problem is that absolute references can 
slip into your program in several subtle ways. 

WARNINGS 
Here is a list of trouble spots and methods to avoid them. 

• Never use compiler flag -bO. 

The purpose of this flag is to force absolute addressing of 
global variables; the far keyword does the same thing, as ab- 
solute addressing is the same as far. 

• Avoid linking with amiga.lib if possible. 

Amiga.lib contains lots of stuff, including loads of absolute 



references. Particular culprits are the system library stub 
functions. Amiga library functions are accessed as offsets rel- 
ative to the library's base pointer. All the details about how 
to access library functions are encased in these small stub 
functions. The problem is that these stub functions make ab- 
solute references to the global variables that contain their re- 
spective library base pointers. 

As an alternative, SAS/C supports pragmas, which are com- 
mands to the compiler describing everything necessary to di- 
rectly access the library functions. Essentially, pragmas give the 
compiler all the information necessary to generate stub func- 
tions on demand. The advantage is that pragmas will use the 
addressing mode in effect when the source file is compiled, 
which for pure programs must be base-relative addressing. 

Some functions, however, don't have pragmas. For example, 
the "varargs" form of several of the new OS 2.0 functions do 
not exist in a library, hence they don't have pragmas. Actually 
they are small "wrapper" functions in amiga.lib that adjust pa- 
rameters before calling a related library function. The solution 
is simple: Write your own version of these wrapper functions. 
(The Weiblen drawer of the companion disk includes a sample 
of a replacement wrapper function.) 

Sometimes you just can't avoid linking with amiga.lib; it 
does contain a lot of useful (and pure) routines. Just make 
sure that BLINK does not report any absolute-reference 
warnings, such as: 

Warning! Absolute reference to _SysBaso 
module: file: llb:amiga.llb 

• Never use the standard directives for retrieving the A4 
global data segment pointer. 

These directives (which include the compiler flag -y, the 
function geta4(), and the saveds keyword) load the base ad- 
dress of the global data area into register A4. They are used 
when functions requiring access to global variables are run 
under another task's context. Specific examples include 
spawned tasks, interrupts, and hooks. These directives don't 
work in pure programs because they have no way of know- 
ing where each invocation's global data segment is located. The 
Weiblen drawer contains a set of simple replacements for these 
directives, along with demo programs illustrating their use. 

By following these guidelines and linking with cres.o, you 
should be able to generate pure programs quite easily. ■ 

Michael Weiblen is an engineer at MS ATT Corp., the creators 
of AmigaVision. Contact him c/o The AmigaWorld Tech Jour- 
nal, SO Elm St., Peterborough, NH 03458, or on BIX (mweiblen). 
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Inside MIDI 

A hardware and software tour 
of the music standard. 

By Mike Harris 




Mini in 



( )VHR Tl IE PAS1 decade, an overabundance of standards 

has appeared in the computer industry. Few, if any, have 
had the universal acceptance of the Musical Instrument Dig- 
ital Interface (MIDI). Perhaps this is because of the unique na- 
ture of its origin. While most industry specifications originate 
in ANSI, ISO, or IEEE committees, MIDI rose from informal 
meetings of leading electronic keyboard manufacturers. Se- 
quential, Roland, Yamaha, Korg, and Kawai met in Tokyo in 
August 1983 and finalized the MIDI 1.0 specification. This 
version remains in use with minor additions and changes to 
the software. The 1.0 spec consists of the hardware required 
and the data format to be used, as you will see. 



THE HARDWARE 

A MIDI device uses asynchronous seri- 
al communication to transfer data be- 
tween equipment. The rate of transfer is 
31.25 Kilobaud with a specified 1% toler- 
ance. The standard 8N1 protocol yields a 
speed of one byte every 320 microseconds 
(10 bits/31,250 bps). IN and OUT ports are 
used for I/O, while a THRU port provides 
a copy of the signal entering via the IN 
port for a secondary device. Included in 
the spec is a diagram of a sample circuit as 
a reference for implementation; however, 
it is not the required method. 

Rather than using a bipolar voltage as a 
signal as RS-232 does, MIDI uses a 5 mA 
current loop. At the receiving end of the 
signal, an opto-isolator is required (the 
reason for the current loop). The isolator 
must have rise and fall times under 2 mi- 
croseconds and need less than 5 mA to be 
turned on. A Sharp PC-900 and HP 6N138 
(see Figure 2) are the recommended parts, 
but many high-speed opto-isolators are 
satisfactory. The reason for the opto-isola- 
tor is straightforward: to isolate the ground 
between equipment. Ground loops result 
in an audible hum in some audio gear and 
thus must be avoided. 

The IN, OUT and THRU ports of a MIDI 
device use five-pin female DIN connec- 
tors. Remember that all equipment does 
not always need all three ports and also 
that multiple INs and OUTs are common. 
Connectors are another area where 



ground loops can occur. You will notice that OUT and THRU 
have pin 2 grounded while IN does not (see Figure 1). This 
allows the shielding to be grounded on the cables and still not 
cause a ground interconnection. The spec calls for cabling 
under 50 feet long and made of twisted-pair wire with the 
shielding connected to the male connector's pin 2 at both 
ends. As long as the shells of the female connectors are not 
tied to ground, a ground loop will not occur. 

THE DATA FORMAT 

A complete discussion of what is needed to implement 
MIDI software deserves its own article; here we will cover the 
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Figure 1. The MIDI hardware standard. 
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Figure 2. The HP 6N138 opto-isolator. 



general concepts of the MIDI format only. If you require more 
information, you can obtain a copy of the MIDI specification 
for $38 from the International MIDI Association, 5316 West 
57th St., Los Angeles, CA 90056, 213/649-6434. The standard 
MIDI file format addendum is an additional $5. 

MIDI messages are broken down into two main categories, 
Channel and System, which in rum are broken down into 
subcategories. Figure 3 illustrates the heirarchy and describes 
the message types. 

All MIDI communication consists of a Status byte followed 
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Channel Messages; Contain four bits to direct the message 
to one of the 16 channels. Only units whose channel num- 
ber matches the channel encoded in the Status byte are of 
interest. 

Voice: Used to control an instrument's voices. 
Mode: Used to control the way an instrument responds to 
Voice messages. 



System Messages: Intended for all connected units. 
Common: Simple messages for communicating with all units. 
Real-Time: Specialized Common messages used for syn- 
chronization and timing. 

Exclusive: Manufacturer-specific messages made up of a 
manufacturer's MIDI ID number and a variable number of 
Data bytes. 



Figure 3. MIDI message organization. 
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Table 1. Bytes required for messages. 

by a number of Data bytes (see Table 1 for the number of data 
bytes required). The distinction between Status and Data is 
made with the Most Significant Bit; for Status bytes MSB=1, 
while for Data MSB=0. When a Status byte is received, Data 
bytes immediately follow (except for Real-Time messages). If 
a new Status byte is received before the prior message is com- 
pleted, the old message is ignored in 
favor of the new one. Table 1 shows 
the Status byte values associated with 
each type of message. When Voice or 
Mode messages are received, the in- 
terface remains in that Status until in- 
terrupted by a different Status byte 
(Real-Time Status bytes interrupt only 
temporarily). The result is a lesser 
amount of data needing to be trans- 
ferred. If a Status is repeated, the byte 
can be omitted and only Data bytes 
can be sent, which is extremely useful 
when transmitting long strings of data. 
Any unimplemented or undefined Sta- 
tus bytes and subsequent Data bytes 
are ignored. 

Each MIDI transmitter or receiver 
can be in one of four Channel modes 
for use in voice assignment, as Table 2 
illustrates. Each unit is set with Omni 
on or off and set to Poly or Mono. 
Omni refers to the equipment re- 



sponding to all Voice Channels (on) without discrimination 
or responding only to the selected Voice Channels (off)- Mono 
forces one voice per Voice Channel (monophonic), while Poly 
allows any number of voices to be allocated (polyphonic). At 
power-up all equipment defaults to mode 1. 

MIDI AND THE AMIGA 

The Amiga has always been a strong contender for the 
MIDI market. Its offering of true multitasking and a flexible 
architecture lends itself well to MIDI software and hardware. 
Many low-cost MIDI interfaces are available for connection 
to the Amiga serial port. One drawback is their reliance on a 
software-buffered serial port that, under heavy system load, 
can result in lost or incorrectly received data. Currently, there 
are professional-quality programs readily available and in 
use. Bars and Pipes Professional (from Blue Ribbon Sound- 
Works), for example, offers everything necessary for profes- 
sional applications. At the time of this writing, Great Valley 
Products is working on a dual serial card with two built-in 
MIDI interfaces. Using such a card or another hardware- 
buffered serial port combined with one of the many high- 
quality MIDI programs available will result in a sound MIDI 
solution. 

With the Amiga's growing acceptance as a multimedia 
platform, MIDI can only contribute to its popularity. Soon, 
products may be available that will allow you to compose a 
score for a film that is playing in an on-screen window. CDTV 
offers another venue that demands high-quality music for its 
programs. Overall, the future is bright for MIDI and the Ami- 
ga. An understanding of the spec is your first step towards 
making your own contribution to the growing pool of avail- 
able hardware and software. ■ 

Mike Harris is a design engineer at Great Valley Products, zvhere 
he develops new hardzvare and ASICs. He has been working with the 
Amiga since 1986. Contact him c/o The AmigaWorld Tech Jour- 
nal, SO Elm St., Peterborough, NH 03458, or on BIX (liarris). 
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Voice messages are received in Voice Channel N only, and are 

assigned to voices polyphonically. 

Voice messages are received in Voice Channels N thru N+M-l, and 

assigned monophonically to voices 1 thru M, respectively. The 

number of voices M is specified by the third byte of the Mono 

Mode Message. 


TRANSMITTER 


Mode 


Omni 


1 
2 
3 
4 


On 
On 
Off 
Off 


Poly 
Mono 
Poly 
Mono 


All voice messages are transmilted in Channel N. 

Voice messages for one voice are sent in Channel N. 

Voice messages for all voices are sent in Channel N. 

Voice messages for voices 1 thru M are transmitted in Voice 

Channels N thru N+M-l, respectively, (Single voice perchennel). 



Table 2. Possible MIDI channel modes. 
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Designing a 
Device Driver 



By Dan Babcock 



EMBEDDING DEVICE-RELATED code in your final ap- 
plication at first may seem less painful than defining a driv- 
er for your unique widget, but it limits the options later. You 
and other programmers will have to rewrite the instructions 
in each piece of software that plans to support it. By defining 
a driver, you simplify future work for yourself and other pro- 
grammers. To make it even easier, chances are you may nev- 
er need to write a unique driver; emulating the programmer 
interface of one of the established system drivers — most com- 
monly trackdisk.device, serial.device, or parallel.device — 
usually does the trick. Existing software will work with your 
new, similar device — for example, a hard-drive controller — 
without alteration. 

Before we examine the basics of crafting a driver, note that 
I assume you have some knowledge of device I/O from the 
application's point of view. 

A DEVICE IS A LIBRARY 

Believe it or not, although libraries and devices seem rad- 
ically different from an application's point of view, they 
share the same overall structure. A library consists of a jump 
table that the program uses to access the provided routines, 
a global data structure that the library's routines use as they 
wish, and a minimum of three standard routines: OpenQ, 
CloseQ, and Expunge(). A device shares the same form, and 
adds two more special routines: BeginlOO and AbortIO(). 
Once you understand the function of these two routines, 
you've got it! 

BEGINIOO 

BeginIO() is the workhorse entry point in a device. All in- 
put and output requests from the user result in a call to Be- 
ginIO(). The BeginlOO vector may be called directly by the 
user, but more commonly it is called as a result of SendlOQ 
or DoIOQ. Very little work is performed by SendlOQ or 
DoIO(); they do little more than call the BeginlOQ vector. 
(An exact description of SendlOQ and DoIOQ is provided 
later.) Passed to BeginlOO are a pointer to an lORequest in 
Al (containing the command number and parameters) and 
a pointer to the device-base pointer in A6 (just like a library). 
The BeginIO() routine examines the command code 
(IO_COMMAND) and determines what, if anything, should 
be done. The only mandatory features of the BeginlOQ rou- 
tines are that they must call ReplyMsgO for the lORequest if 
the quick bit is zero and must set the LNJTYPE of the lORe- 
quest to NT_MESSAGE. (The latter requirement is an ob- 
scure bug fix.) Other than that, the implementation of Be- 
ginlOQ is left to your imagination. 
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Implementation would be trivial if not for a couple of is- 
sues. BeginIO() may be concurrently called from many tasks. 
The routines in a driver that manipulate the hardware are al- 
most never reentrant, however, so this is a major issue. Sec- 
ond, by convention BeginlOQ is expected to not take a "long 
time" to complete. A "long time" is really a duration that 
depends on an external event that may or may not occur in 
a known period. The reason is that applications expect — and, 
in fact, sometimes depend on — asynchronous operation 
when they call SendIO(). In this case, BeginlOO does not ac- 
tually perform the I/O, but merely passes the request to an 
independent task for further processing. 

Tine most common solution to these problems is to set up 
an independent task (or tasks) associated with the driver. Be- 
ginlOQ simply passes the I/O request to the proper task and 
returns. This is usually accomplished by using PutMsgQ to 
send the lORequest to a message port associated with the 
task. PutMsgQ uses the linked-list fields at the start of an lORe- 
quest structure (conveniently provided for this purpose) to at- 
tach the lORequest to the message port, and signals the task 
to wake it from its dormant state. The task then calls GetMsgQ 
to access the lORequest at the top of the list (and unlink it) and 
proceeds to execute the requested command. This arrange- 
ment neatly solves the reentrancy problem, provides for asyn- 
chronous operation, and serializes incoming requests. 

This is a lot of work for the device, but it benefits all ap- 
plications: Rather than an application being forced to spawn 
a plethora of tasks to handle I/O, the application simply uses 
SendlOQ to achieve concurrency. The amount of work this 
saves all application writers makes device drivers worth- 
while. In fact, this is almost the entire point of having device 
drivers, as opposed to simple libraries of functions. 

Occasionally, you can take another, simpler, approach to 
writing BeginlOQ. Some drivers do not require asynchronous 
operation to be useful; it is only a frill. In this case, you may 
choose to dispense with the task business and do everything 
within BeginlOQ. That leaves only the problem of multiple 
tasks calling BeginlOO concurrently. A simple semaphore 
suffices to solve this: BeginlOQ calls ObtainSemaphoreQ to do 
its work, and then finishes by calling ReleaseSemaphoreQ. If 
the BeginlOQ routine is busy when it is called, the task that 
called BeginlOQ will sleep until the semaphore is released by 
a call to ReleaseSemaphore(). The code savings of this ap- 
proach over the task method is considerable. 

Remember, however, that some types of drivers — such as se- 
rial and disk drivers — absolutely must support asynchronous 
I/O. Disk drivers are a good example of how a multitasking 
OS can get things done while slow hardware is working. File 
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Why continually include specific device instructions in your programs 
when you can call a driver that does the work for you? 



systems can send out asynchronous requests for blocks and 
process the next block while the previous one is fetched. In 
such cases, the simpler approach is not an option. 

ABORTIOO 

The other device-unique entry point, AbortlOQ, is passed 
a pointer to an IORequest in Al and the device-base point- 
er in A6, and is expected to attempt to make the I/O job abort 
and return more quickly than usual. If successful, AbortlOQ 
returns zero in DO and stores IOERR_ABORTED in IO_ER- 
ROR. For some devices (as for disks), aborting is not critical 
and may be ignored. For others (such as the serial device), 
the AbortIO() routine is very important and must be handled 
properly. 

To do so requires a bit of advanced planning. Every time 
the driver task (or tasks) calls the Exec Wait() function to wait 
for an event, it should simultaneously wait for a special abort 
signal. To force an early exit, the AbortlOQ routine then can 
send this abort signal to the relevant task. The I/O operation 
that is to be aborted might not actually be active at the time 
AbortIO() is called, however. There are two such cases: ei- 
ther the IORequest has already finished or the IORequest is 
sitting in a queue waiting to be processed. Note that it is il- 
legal for an application to abort an I/O request that has not 
been initiated yet, so that scenario is not a concern. If the 
LN.TYPE field of the IORequest is NT_REPLYMSG, then the 
I/O has completed, and ReplyMsgQ has processed the re- 
quest. The AbortlOQ routine should simply exit. To deter- 
mine whether the request is currently being processed, Abor- 
tlO() simply compares the IORequest pointer passed to 
AbortIO() with the current IORequest, which should be 
maintained in the task code for this purpose. This field 
should be set to zero when no I/O requests are being pro- 
cessed to avoid confusion and to act as a task-busy flag. If 
the IORequest is not active, but waiting to be processed, then 
the IORequest may be "defused" by setting a special "ig- 
nore" flag in the IO_FLAGS byte of the IORequest; the up- 
per four bits are reserved for the driver's use. When the task 
fetches the IORequest from the queue (message port), it can 
test the ignore flag; if it is set, ReplyMsg(IORequest) is exe- 
cuted and no further action is needed. 

Implementing the AbortlOQ feature can be quite tricky. If 
your driver requires an abort feature, take a close look at the 
AbortlOQ routine in the example serial driver provided in 
the accompanying disk's Babcock drawer. 

Discussions of the other standard entry points — OpenQ, 
Close (), and ExpungeQ — follow. Although the descriptions 
generally apply to the library versions of these routines, some 



of the details are different. These routines, as well as Be- 
ginlOQ and AbortIO(), follow the usual Exec register con- 
ventions; DO, Dl, AO, and Al are scratch; all other registers 
(except status) must be preserved. 

OPENO 

Called by OpenDeviceQ, the OpenQ routine receives the de- 
vice-base pointer in A6 (as usual) plus the OpenDevice() pa- 
rameters: the IORequest pointer in Al, the unit number in DO, 
and the flags in Dl. Multitasking is disabled by Exec before 
calling OpenQ. What the Open() routine does is entirely up to 
the device. Usually it sets up for a particular unit by creating 
a unit task (or tasks), setting up a unit-specific data structure, 
initializing unit-specific hardware registers, and so on. 

The driver can keep track of which unit goes with which 
IORequest by storing a pointer to the unit-specific data struc- 
ture (or any other convenient indicator) in the IOJJNTT field 
of the IORequest provided for this purpose. Applications do 
not use IO_UNIT. The OpenDeviceQ routine automatically 
stores the device-base pointer in the IO_DEVICE field of the 
IORequest, for the later use of the application, Exec, and 
driver. 

CLOSEO 

The Close() routine is called from CloseDeviceQ with the 
IORequest in Al. Multitasking is disabled by Exec before 
calling CloseQ. Usually, CloseQ checks for outstanding 
OpenQ calls on this unit. If there are none, it releases various 
unit-specific resources. In addition it might choose to call Ex- 
pungeQ (see below) if there are no openers for the entire de- 
vice. (Expunge Q is never called if your opencount is nonze- 
ro.) If the device should be unloaded from memory, CloseQ 
returns the segment list (as given to the initialization routine; 
see below); otherwise it returns zero in DO. 

Ramlib (used when the driver is first called) uses 
semaphores to prevent multithreading Open() and CloseQ; 
Init/ForbidQ is a side effect. Expunge() is called from the 
memory allocator AllocMem() with only Forbid(). 

EXPUNGEO 

The Expunge() routine of a device or library is called by Al- 
locMem under emergency low-memory situations. If possi- 
ble (nothing is currently using the device), the device should 
deallocate all memory buffers, close anything that needs to 
be closed, and generally clean up. The device is removed 
from memory if Expunge() returns the segment list in DO; oth- 
erwise ExpungeQ should return zero in DO. Multitasking is 
disabled by Exec during the ExpungeQ routine. • 
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Note, however, that drivers should deallocate and release 
important resources on the last CloseQ, not ExpungeQ. Do not 
pattern your device after the bug that allows serial. device to 
hold onto miscellaneous resources until ExpungeQ. 

THE BIG PICTURE 

Up to now the discussion has been from the device's point 
of view. To put it all together, examine the application-Exec - 
device interaction in its entirety: 

Synchronous I/O 

1. The application calls Exec DoIO() with the IORequest 
pointer in Al. 

2. DoIOQ sets the IO_QUlCK bit of ICLFLAGS by moving 1 
into ICLFLAGS, 

3. DoIO() loads ICLDEVICE into A6 and calls the device's Be- 
ginIO() vector. 

4. The BeginlOQ routine executes the command (in IO_COM- 
MAND) immediately or sends the request to a task and re- 
turns. In the latter case, BeginIO() clears the IO_QUICK bit. 

5. DoIOQ calls WaitIO(). 

6. WaitlOQ checks the IO_QUICK bit: if it is set, WaitlOQ 
loads the error code into DO and returns immediately; other- 
wise it calls Wait() until it receives the reply port's signal bit, 
unlinks the IORequest from the reply port's message queue, 
and returns with the error code (taken from IO_ERROR) in 
DO. (The signals may remain set!) 

Asynchronous I/O 

1. The application calls Exec SendIO() with the IORequest 
pointer in Al. 

2. SendlOQ clears the IO_FLAGS byte (with the intent to clear 
IO_QUICK). 

3. SendlOQ loads IQTJEVICE into A6 and calls the device's 
BeginlOQ vector. 

4. The BeginlOQ routine should merely send the request to 
an associated task and return, if the request will take some 
time to satisfy. It may optionally perform the request right 
away. 

5. SendlOQ returns. 

In all cases the driver should call ReplyMsg(IORequest) if 
and only if the IO_QUICK bit is clear. In addition, if the driv- 
er does not want to complete a given request immediately, it 
must clear the QUICK bit before returning from BeginlOQ. 
(The associated task should perform a ReplyMsgQ when fin- 
ished.) Note that it is always safe to clear the QUICK bit, but 
never safe to set it. 

HOW A DEVICE IS INITIALIZED 

The device jump table and other structures necessary to 
make a device functional do not appear by magic; something 
has to create them. Fortunately, this is not the responsibility 
of the device driver. The only requirement is that the device 
begin with a RomTag structure. If the driver is present in the 
DEVS: path, then AmigaDOS will (inOpenDevice()) load the 
driver and call InitResidentQ to initialize the device (or li- 
brary). The RomTag and related structures may be directly 
copied from one of the sample drivers without much com- 
plication, so I won't describe them here. The most interest- 
ing part of this process is that an initialization routine is called 
in the driver. The initialization routine is a very convenient 
place to perform one-time-only set-up tasks. The routine is 



called with the device-base pointer in DO and the AmigaDOS 
segment list in AO. The initialization routine is responsible for 
saving the segment list for the later use of Close() or Ex- 
punge() (see above). If all goes well, the routine should return 
the device pointer in DO, or return zero in DO to indicate an 
error. Note that the initialization routine follows the usual 
Exec register convention. 

DEBUGGING 

Unfortunately, there are no truly adequate tools available 
for debugging complex drivers (ignoring very expensive 
hardware solutions). One useful technique is to output de- 
bugging text via the internal serial port. See the PUTDEBUG 
macro in the serial driver for an example. PUTDEBUG ac- 
cesses the hardware directly, so it may be used anywhere, in- 
cluding interrupt routines. The downside is that it destroys the 
normal timing of events, hiding bugs and precluding some 
tests. In the end, there is no substitute for reading the source 
code and knowing that it must work. This ideal is often 
thwarted in the real world, however, by cryptic or incomplete 
documentation for the hardware that the driver is control- 
ling. Patience is definitely a virtue when debugging drivers! 

In the accompanying disk's Babcock drawer, you will find 
two example drivers written in assembly language, as is typ- 
ical of most device drivers. RAMDISK.ASM acts as a simple 
RAM disk and represents a "minimal" device driver. NEW- 
SERASM is a full-blown serial driver for the Rockwell 65C52, 
demonstrating all the intricacies of a typical "real" driver. 
Another good guide is Commodore's example driver print- 
ed in the Amiga ROM Kernel Reference Manual: Libraries & De- 
vices; the comments it contains represent the official guide to 
writing drivers, I strongly encourage you to examine these ex- 
amples, as they contain a wealth of useful information. 

INSIDE THE RAM-DISK DRIVER 

A RAM disk is good first example because it can be great- 
ly simplified. In this sample, the initialization routine allo- 
cates a buffer of 880K (which you may change to any value) 
and stores the pointer in the device-base structure. Open(), 
CloseQ, and Expunge() do essentially nothing; only one unit 
is supported, so the unit number is not checked. The device 
is never removed from memory. The BeginIO() routine does 
not use tasks or semaphores, because they are not needed for 
a RAM disk. It accepts two commands, CMD_READ and 
CMD_WRTTE. It returns all other commands with IO_AC- 
TUAL=0 without doing anything. AbortlOQ does nothing, 
because there is no way or reason to abort a request. All in 
all, a RAM disk is not very representative of a "real" driver, 
but it is a very nonthreatening introduction to drivers and 
useful in its own right. 

INSIDE THE SERIAL DRIVER 

The example serial driver is a real-world example of a de- 
vice driver. It represents close to the worst case in terms of 
complexity. This serial driver is unusual in that it uses two 
tasks per unit — one dedicated to reading, the other to writ- 
ing, because software expects to be able to read and write con- 
currently. It would be more consistent if the application were 
required to open separate units for reading and writing, but 
this arrangement works fine. The BeginIO() routine examines 
the IO_COMMAND field of the IORequest: if it is 
CMD_READ, the request is directed to the read task. If Be- 

Continued on p. 63 
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In response to the clamor for another videotape featuring 
Amiga animations, the Editorial Staff of AmigaWorld has 
created ANIMATION VIDEO, VOLUME TWO. AmigaWorld 
sponsored another contest soliciting entries from talented 
Amiga animators. The Editors sifted through hundreds 
of submissions and countless hours of animation clips to 
select the very best in animated art. The result is a videotape 
with scintillating animations, showcasing the efforts and 
talents of Amiga enthusiasts. 

ANIMATION VIDEO, VOLUME ONE was a best-selling 
video, containing commercially broadcast and award-winning 
work. The second volume is even more exciting, due to such 
innovative animation programs as Sculpt-Animate 4D, 
LightWave 3D, Turbo Silver, Imagine and Deluxe 
Paint III. The animations on this video will impress you with 
technical brilliance and delight you with imaginative plots. 
"""-m'll be thoroughly entertained as you absorb new 

.imation techniques and ideas. Whether you just brought 
your Amiga home from the store or you have created your 
own animation art before, you'll want to add ANIMATION 
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Spawning Tasks 



By Steve Krueger 



YOUR PROGRAMS CAN launch new processes either 
synchronously or asynchronously. Synchronous creation is 
used most often when a program needs to execute a second 
program and wait for it to finish before continuing. The stan- 
dard make utility is an example of this type of program. Syn- 
chronous processes do not take full advantage of the multi- 
tasking abilities of AmigaDOS. In many situations, you might 
want one process performing a task in the background while 
another responds to user input — in other words, tasks exe- 
cuting ascynchronously. As synchronous processes are the 
simplest to create, we'll examine them first. 

The system call Execute() is the most common way to im- 
plement a new process synchronously. ExecuteQ takes three 
parameters — a command string, an input file handle, and an 
output file handle — and creates a new CLI that executes the 
specified commands as if they had been typed in at the Shell 
prompt. Once the command or commands are executed, the 
new CLI terminates and the program that called Execute() 
continues to run. For example, to use this call to display a di- 
rectory, you would use the following: 

Execute("dlr", NULL, OutputO); 

This executes the command D1R and displays its output to 
the output file handle, Output(). The call will not work if the 
program is run from Workbench because Output() does not 
exist. If the output file handle Is NULL, the output goes to the 
current window. Once again, there is no current window un- 
der Workbench. 

To open a new window and send the output to it, use the 
code fragment below: 

fh = Open("CON:", MODE_OLDRLE); 
ExecuteCdlr", NULL, fh); 

If the second parameter (the input file handle) is not NULL, 
then the CLI reads from the input file handle after the com- 
mand string is executed. This continues until the end of the 
file is encountered. You can use this feature to execute a script 
file from within a program. If the command string is NULL, 
input is read from the input file handle immediately. 

The ExecuteQ routine has a couple of limitations. When a 
process run from Workbench calls ExecuteQ, the routine has 
no default input file handle, and it inherits no path. You must 
provide an input file handle when running under Work- 
bench. This can be accomplished by calling Open() for NIL:. 
The path is optional. To provide one, you must create a fake 
CLI environment and copy the path from the Workbench 
process. A second drawback to Execute() is that it does not 
return the return code of the command it executes. Com- 



modore added a new routine, SystemQ, to Amiga OS 2.0 to 
remedy this situation. 

The SystemQ routine is similar to Execute<), but has a few 
significant differences. It can execute a program either syn- 
chronously or asynchronously, and it will not read from the 
input file handle. The return from SystemQ is -1 if the com- 
mand could not be executed, otherwise it is the return code 
for the program. System() takes two parameters: a command 
string, and a pointer to a tag list. (See "Digging Deep in the 
OS," p. 8 for an introduction to tags and tag lists.) In the tag 
list, you can specify the input file handle, output file handle, 
synchronous or asynchronous execution, and the type of 
Shell to be used. For more information, refer to dos/ 
dostags.h. Remember, System() is available only under ver- 
sion 2.0 of the operating system. 

Both SystemQ and ExecuteQ search the resident list for the 
specified command. There is no other documented method 
of executing commands from the resident list. 

ASYNCHRONISITY 

The setup required to create a new process asynchronous- 
ly is more involved than synchronous creation. The table be- 
low outlines the steps that the parent (original) and child 
(spawned) processes follow for specific periods of time. 



Parent 

Set up fake seglist 
Call CreateProcQ 
Set up start-up message 
Send start-up message 
Continue executing 
Wait for reply to start-up 
Free child resources 



Child 



Walt for start-up message 

Gather needed info from startup 

Execute user function 

Reply to start-up message and terminate 



■" indicates that the child process does not exist at this time. 

The system routine that does most of the work is Create- 
ProcQ. CreateProcO takes four parameters — the process 
name, a seglist, the priority, and the stack size. The process 
name is a NULL-terminated string that need not be unique. 
The priority is an integer value specifying the priority at 
which the new process will run. Size of the new process' 
stack is specified in LONGWORDS. The seglist is a BPTR 
pointing to the first code hunk that will be executed. You can 
obtain a seglist by calling LoadSegQ or create a fake one. Use 
LoadSegO if the new process is going to execute a new pro- 
gram. If the new process is to run in the same code as the cur- 
rent program, you must create a fake seglist. Let's try the lat- 
ter method. 
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Follow these steps to easily create sychronous and 
asynchronous processes. 
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Commodore's AutoDocs for CreateProc() suggest the fol- 
lowing code to create a fake seglist: 

ds.l ;Align to longword 
DC.L 16 ;Segment "length" (faked) 
DC.L ;Pointer to next segment 
...start of code... 

The example provided below and on disk (in the Krueger 
drawer) accomplishes the same thing as the assembler frag- 
ment above, but is written in C. The structure definition for 
the fake seglist is: 

struct FAKE_SegList { 
long space; 
long length; 
BPTR nextseg; 

short jmp; 

void ('func){); 

}; 

The first three fields correspond to the three fields speci- 
fied in the AutoDocs. The jmp field is initialized to the hex- 
idecimal value for the JMP instruction. The function pointer, 
func, is loaded with the address of the function process_ 
starterQ. These two fields form the first code hunk for the 
new process that will be created by CreateProc(). By using 
this method, you avoid using any assembly language stub 
routines. 

After the fake seglist is created, it is passed to CreateProc(), 
with the process name, priority, and stack size. Remember, 
the seglist pointer must be converted to a BPTR, CreateProcQ 
will create a new process that will begin executing at the JMP 
instruction of the fake seglist, and then jump to process_ 
starterQ. This function waits for the start-up message, per- 
forms the necessary initialization, calls the designated user 
function, and then replies to the start-up message, signifying 
that the child process has ended. 

The child waits for the start-up message by calling Wait- 
Port() with message port from its process structure. When 
WaitPort() returns, the start-up message is ready to be 
fetched. You accomplish this with GetMsg(). The message 
can be of any size, containing any information necessary to 
run the new process. The only restriction is that the first item 
in the message be a struct Message. This example uses the 
start-up message to pass two pieces of additional information 
to the child process — the global data pointer and a pointer to 
the function that the user wishes to execute. The structure 
used to hold this information is struct ProcMsg and is defined 
in process.h. The start-up message is created by allocating 



memory and filling in the desired fields. It is passed to the 
child process via PutMsgQ. 

Now that the parent process has sent the start-up message 
and the child has received it, both processes can run simul- 
taneously. Eventually, the parent process must wait for the 
child process to finish. In the example, wait_process() ac- 
complishes this by waiting on the reply port of the start-up 
message. This is how the child process will signal its com- 
pletion. When the reply is received, the parent knows the 
child has finished, and it is safe to free the reply port and 
memory associated with the start-up message. Note that the 
return code is passed from the child processes to the parent 
in the return_code field of the start-up message. This value 
is extracted before the start-up message is freed. 

LOOK IT UP 

In the Krueger drawer, the example consists of three files — 
main.c, process. c, and process.h. The first, main.c, is just a 
driver program that calls the routines in process.c. It con- 
tains three functions — mainQ, process 1(), and process2(). By 
using the routines in process.c, the mainQ function creates 
two child processes that run simultaneously, and then waits 
for them to finish. The functions processlQ and process2() 
contain the code that the child processes will execute. Note 
that both processlQ and process2() call ExecuteQ, demon- 
strating that it is possible to combine the creation of syn- 
chronous and asynchronous processes. 

The file process.c contains the routines necessary to create 
new processes and wait for their termination. They may be 
used "as is" or modified as desired. The file process.h con- 
tains the necessary prototypes and structure definitions for 
process.c. 

To use these routines, include the process.h header file and 
call the functions as follows: 

msg = start_process(lnt (*func)<), long priority, long stacksize); 
ret = wait_process(struct ProcMsg "msg); 

where 

msg is a struct ProcMsg * defined in process.h. 

func is the function that will be the entry point of 

the new process, 
priority Is the priority of the new process. 

stacksize is the stacksize of the new process, 
ret is the return from the new process. 

Remember to be careful about which functions you use to 
create new processes. All functions must be re-entrant and » 
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must not call exit(), as exit() will free all the memory allocat- 
ed by the main program while the main program continues 
running. 

You can customize the files process.c and process.h. Areas 
of interest are indicated by comments containing the string 
"user." These areas indicate where you can add code to pass 
additional information to the child process. 

The above example uses code from the current program as 
the code for the new process. If you wish to create a new pro- 
cess that executes code from a separate program, then use 
LoadSegO to obtain a seglist. 

LoadSegO takes one parameter, the name of the program 
to load, and returns the seglist. If it is unable to load the pro- 
gram, LoadSegO returns NULL. You can then use the seglist 
LoadSegO provides to call CreateProc(). After this call, ev- 
erything is the same as in the previous example, with one ex- 
ception: When the child process finishes executing, you must 
call UnloadSegO to free the memory being used by the child 
process. 

This is the technique that the SAS/C routines forklO and 
forkv() use to create child processes. The fork() routines call 
LoadSegO and CreateProc(), while wait() is responsible for 
waiting for the child process to finish and calling Unload- 
SegO. Remember that LoadSegO will not l° a d a command 
from the resident list, and, therefore, you cannot use the fork 
routines to execute any resident command. 

COMMUNICATION BETWEEN PROCESSES 

The example described above uses the start-up message to 
communicate all the information to the child process. How- 
ever, messages may be sent at any time+ and to any process. 
All you need is a message port that is known by both pro- 
cesses. For the example above, in which the parent creates the 
child process, the parent could create another message port 
and pass it to the child via the start-up message. In the case 
of two independent processes, communication is accom- 
plished through a global message port. Communication with 
the AREXX process is an example of this type of message 
passing. 

Signals may also be used to communicate between pro- 
cesses. This method is generally used to signal the termina- 
tion of a process. For example, assuming that the child pro- 
cess was set up to handle a CTRL-C signal, the parent could 
execute the following to signal the child to break: 

SetSlgnal(emki_proc»M, SIGBREAKF_CTBL_C); 

The example above is not set up to receive signals. If a 
child process created by the example code receives such a sig- 
nal while it is performing level-two I/O, the child process will 
attempt to call the break handler routine. By default, break 
handler will call exit(), which will free all allocated memory 
(including the parent's memory) before terminating. To avoid 
this situation, the default break handler is set to NULL before 
any child processes are created. In SAS/C, this is accom- 
plished with the line: 

_ONBFtEAK = NULL; 

SHARING MEMORY 

Because the Amiga has a single address space, all memo- 
ry is available to all processes. This makes sharing memory 
between processes very easy, but also very dangerous. For ex- 
ample, imagine the case in which one process is traversing a 



linked list, while another is modifying it. The process travers- 
ing the list could start accessing freed memory. To avoid such 
situations, all references to memory that can be modified 
must be surrounded by Forbid() and Permit() calls. In the ex- 
ample above, register A4 is passed to the child process, al- 
lowing the child and parent processes to use the same glob- 
al data. Therefore, all areas of code that access global data that 
is not read-only must be surrounded by Forbid() and PermitO 
calls. It is very important to be cautious of the library routines 
you use as well. 

Avoid all library routines that are not re-entrant. The 
SAS/C memory allocation and file I/O functions are two ex- 
amples of library routines to be avoided. Child processes 
should use AmigaDOS system calls for memory allocation 
and file I/O. The memory-allocation routines may be used if 
extreme care is taken to surround the calls with Forbid() and 
PermitO- Such stubs as the following may be used, but they 
must be used by all processes sharing the global data, in- 
cluding the parent process. 

void *my_malloc(length) 
Int length; 

{ 

void "ret; 

Forbid(); 

ret ■ malloc(length): 

Permtt(); 

1 

This technique will not work with the file I/O routines, as 
they call the AmigaDOS I/O routines that break Forbidf) 
calls. 

THE SAMPLE PROGRAM 

The sample program has been provided in Workbench en- 
vironment for the SAS/C 5.10 compiler. You may have to 
make minor modifications for the program to run under 
DICE or Manx C. Two locations where changes for Manx C 
are required are denoted by comments. 

Open the drawer containing the example. You should see 
several icons. Double click on the BUILD icon to compile and 
link the program. Double click on the TEST icon to run the 
program. Three windows will appear. The first is the stan- 
dard window that opens as standard output for SAS/C pro- 
grams. Two more windows will open. The output that goes 
to these to windows comes from two child processes running 
simultaneously. The first process calls Execute() for the DIR 
command, while the second calls Execute() for ECHO. Both 
processes call Delay () so the windows do not close before 
you get to see the output. 

The multitasking ability of the Amiga is a valuable asset. 
The code provided here makes it easy to create child pro- 
cesses that can handle a wide variety of tasks. Because most 
programs wait for user input for a large percentage of the 
time, why not take advantage of this multitasking ability and 
start using those idle CPU cycles. ■ 

Steve Krueger is a Systems Developer for SAS Institute. He was 
responsible for a major part of the 5.10 release, including LSE, LC2, 
BLINK, and ASM. Steve is currently working on the next Amiga 
release. Other projects include code generators for the Apollo 3000 
series and the HP700 series. Contact him c/o The AmigaWorld 
Tech Journal, 80 Elm St., Peterborough, NH 03458, or on BIX 
(skrueger). 
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^ Programming 
2.0's NewMenus 

More time-savers from gadtools.library. 



By Paul Miller 



IF YOU HAVE ever been frustrated with creating menu- 
strips by hand, then you will love OS 2.0's NewMenus. In ad- 
dition to the new gadget classes I outlined in the August/ 
September '91 "Graphics Handler" (p. 14), gadtools.library 
provides a very simple new menu format Under it, you can 
lay out an entire strip, including titles, items, and subitems, 
with one, easy-to-understand array of structures. GadTools 
handles all the details of menu and item positions and sizes, 
linking of items, and text layout for you. (1.3 programmers do 
not despair. Keep reading; I have a little surprise for you later.) 

THE APPETIZER 

To create a menu layout definition with NewMenus, all 
you have to do is fill out an array of NewMenu structures. 
Each element of the array will define a new menu title, item, 
subitem, or the end of the strip. The NewMenu structure is 
declared in libraries/gadtools.h: 

struct NewMenu 

{ 

LI BYTE nm_Type; /* See below V 

STFiPTR nni Label; /' Menu's label 7 

STRPTR nm_CommKey; I" Menu Item Command Key Equiv V 

UWORD nm Flags; f Menu or Menu Item flags */ 

LONG nm, MutualExclude; f Menultem MutualExclude word V 

APTR nm_UserData; I* For your own use 7 

}; 

The nm_Type can be NM„TITLE, NMJTEM, NM_SUB, or 
NM_END. For example, here's a simple menu definition: 

struct NewMenu sample_menuQ = { 

{NMJTTLE, "Project", NULL, NULL, NULL, NULL}, 
{NMJTEM, "New", "N", NULL, NULL, NULL}, 
{NMJTEM, NM J3ARLABEL, NULL, NULL, NULL, NULL}, 
{NMJTEM, "Open...", "O", NULL, NULL, NULL}, 
{NMJTEM, "Close", NU LL, NULL, N U LL, N U LL}, 
{NMJTEM, NMJ3ARLABEL, NULL, NULL, NULL, NULL}, 
{NMJTEM, "Save", "S", NULL, NULL, NULL}, 
(NMJTEM, 'Save As...","A" ( NULL, NULL, NULL}, 
{NMJTEM, NMJ3ARLABEL, NULL, NULL, NULL, NULL), 
{NMJTEM, "Quit", "Q", NULL, NULL, NULL), 

{NMJTTLE, "Edit", NULL, NULL, NULL, NULL}, 
{NMJTEM, "Undo", "U", NULL, NULL, NULL}, 
{NMJTEM, NM J3ARLABEL, NULL, NULL, NULL, NULL}, 
{NMJTEM, "Cut", "X", NULL, NULL, NULL}, 
{NMJTEM, "Copy", "C", NULL, NULL, NULL}, 
{NMJTEM, "Paste", "V", NULL, NULL, NULL}, 



{NM_END, NULL, NULL, NULL, NULL, NULL} 



1; 



As you can see, it is relatively easy to determine what the 
menu strip will look like. Note that you no longer have to deal 
with IntuiText structures — gadtools.library creates them for 
you. GadTools will automatically generate a separator-bar 
item for you if you specify the nm_Label field as NM_BAR- 
LABEL. You should also note that menus and items are now 
enabled by default. If you want to disable a menu or item, use 
NM_MENUDISABLED or NMJTEMDISABLED, respec- 
tively, in the nm_Flags field. The CHECKIT, MENUTOG- 
GLE, and CHECKED flags are still around, of course, as well 
as mutual exclusion and the item highlighting flags. An ad- 
ditional field, nm_UserData, lets you attach some of your 
own information to each menu item for the program act upon 
as you see fit when the item is selected. This is a good place 
to stick a function pointer for that item. When the item is se- 
lected, you could call the function attached to the item, with- 
out needing to parse the returned menu ID to figure out 
which item was selected. 

You might have noticed by now that there does not seem 
to be a way to attach IMAGES to items anymore. You're par- 
tially right — there only seems to be no way of doing it. If you 
want to use an Image as an item or subitem, you need to use 
IMJTEM or IM_SUB as the nm_Type specifier. In these cas- 
es, nm_Label points to an Image structure. 

THE MAIN COURSE 

Gadtools.library provides a set of functions that assist in al- 
locating and deallocating standard 1.3 menu-strips, based on 
the NewMenu layout. To create a menu that is ready for at- 
taching to a window all you need to do is pass your New- 
Menu array pointer to the GadTools function CreateMenus(), 
which performs all the Menu, Menultem, and IntuiText struc- 
ture allocations and linking. For example: 

menu = Create Men usfnowmenu, tagl, ...); 

CreateMenus() returns a pointer to a standard, dynami- 
cally allocated 1.3 menu-strip that you can parse and play 
with just like in the old days. You will need to do a little ex- 
tra work, however, if you use the UserData field provided by 
NewMenus. Because there is no space reserved in the 1.3 
Menu and Menultem structures, when allocating its memo- 
ry CreateMenus() tacks on a few bytes to make room for you. 
You must use a couple of special macros to get access to your 
data. To access the UserData field of a Menu, use the GT- 
MENU_USERDATA() macro, and pass it a pointer to the de- ► 
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sired Menu structure. For an item, use GTMENUITEM_ 
USERDATA(), and pass a Menultem pointer. 

Now that you have your allocated menu-strip, you must 
send it to one more initialization function: LayoutMenus() 
handles all of the positional and size calculations, based on 
the specified Menus, Menultems, display Visuallnfo data, 
and font. 

success = LayoutMenus(menu, visuaLinfo, tagl, ...); 

Consequently, gadtools. library must know everything 
about the font used in drawing the menu. This font infor- 
mation is passed to LayoutMenusQ through the use of tags. 

ff you are unfamiliar with 2.0's tag facility, here is a quick 
review. (For a complete discussion, see "Digging Deep in the 
OS," p. 8.) Tags are an extensible method of adding features 
and parameters to many new Intuition objects, and they form 
the basis for specifying GadTools gadget options. Tags con- 
sist of a tag type and data pair, and are specified in one of two 
ways. First, tags can be passed to most functions through a 
variable argument system, where one 
or more tag pairs is passed on the 
stack, followed by an end-tag 
(TAG_END or TAG_DONE). You can 
also send tags as an array of Tagltems 
{defined in utility/tagitem.h), fol- 
lowed by an end Tagltem. 

CreateMenus() and LayoutMenus() 
both accept tags as variable argu- 
ments. These actually call Create- 
MenusAQ and LayoutMenusAf), re- 
spectively, which accept an array of 
Tagltem structures. 

Currently only a few tags are sup- 
ported by NewMenus. The desired 
font is specified with the GTMN_Text- 
Attr tag and is accepted by Layout- 
Menus(), followed by a pointer to a 
filled-out TextAttr structure describ- 
ing the font LayoutMenusQ returns 
TRUE if the font could be opened suc- 
cessfully. Otherwise, it returns NULL. 

You can specify the color that menu- 
item text is rendered in by sending the 
GTMN_FrontPen tag to CreateMenusQ, followed by the de- 
sired pen color. 

You can find out error information from CreateMenus() by 
sending the GTMN_SecondaryError tag, followed by a point- 
er to a ULONG, initialized previously to NULL. If an error 
occurs during menu-strip creation, one of the following con- 
ditions are set in the variable: 

GTMENU INV ALID: The NewMenu structure describes 
an illegal menu. CreateMenus() will return NULL. 

GTMENU_TRIMMED: The NewMenu structure has too 
many menus, items, or subitems, and the resulting menu will 
be trimmed down. 

GTMENU_NOMEM: There was not enough memory for 
CreateMenusf) to allocate the entire menu. It returns NULL. 

If you pass the GTMN_FullMenu tag to CreateMenus(), 
followed by data set to boolean TRUE, CreateMenusQ is 
forced to build menus based on complete NewMenu arrays. 
If a menu fragment is discovered, CreateMenus() returns 
NULL. 



Once you create your menu-strip with the simple gad- 
tools.library NewMenu system, the scenario is just like in 1.3. 
You should have a complete, 1 .3-compliant menu-strip (ex- 
cept for the extra UserData information), ready for use. 

Intuition has a new function, ResetMenuStripO that works 
exactly like SetMenuStripQ except its faster, and is useful 
only if the menu-strip layout has not changed. Here is the 
AutoDoc-condoned sequence of events for handling your 
menus: 

1. Call OpenWindowQ. 

2. Call SetMenuStrip(). 

3. Perform zero or more iterations of the following steps: 
Call GenrMenuStrip(). 

Change CHECKED or ITEMENABLED flags. 
Call ResetMenuStripO. 

4. Call ClearMenuStripQ. 

5. Call CloseWindowQ. 



When you are finished, 

you can dispose of 

an entire menu-strip 

by passing the 

menu pointer 

to the FreeMenusO 

function." 



GadTools provides one more func- 
tion for dealing with NewMenus. 
When you are finished, you can dis- 
pose of the entire strip by passing the 
menu pointer (originally given to you 
by CreateMenusO) to the FreeMenusO 
function. 



AND FOR DESSERT... 

What can you do if you don't cur- 
rently have access to 2.0, but want the 
simplicity of creating menus using the 
NewMenus facility? Fear not, for in 
the Miller drawer of the accompany- 
ing disk I have included a set of re- 
verse-engineered functions that (for 
the most part) act like the gadtools.li- 
brary NewMenu equivalents. Link 
newmenus.o with your standard 1.3 
code, and you will have the ease-of- 
use of 2.0 NewMenus too! This mod- 
ule is in SAS/C format, so dou- 
blecheck the source before you 
recompile it for a different environ- 
ment. Note that newmenus.o does not incorporated// of the 
features of version 37 NewMenus, but my test menu-strip 
(used in the example code) works the same using both Gad- 
Tools' and my own functions. See the source for details on 
what has been left out or does not work quite as expected. In 
case you do not have access to 2.0 includes, I have added a 
header file (newmenus.h) that defines all of the required 
structures and tags used by the NewMenu functions. Include 
this in place of libraries/gadtools.h and you are all set! 

NewMenus provides a simple, extensible new format that 
makes it much easier to create, edit, and understand complete 
menu-Strips. Coupled with powerful functions that handle 
the gory details of consistent dynamic menu allocation, lay- 
out, and deallocation, gadtools.library's NewMenus brings 
sanity back to managing your menu! ■ 

Paul Miller has been a developer since 1985 and is currently ma- 
joring in Computer Science at Virginia Tech. Contact him c/o The 
AmigaWorld Tech Journal, 80 Elm St., Peterborough, NH 
03458, or via Internet (pmiller@vttcf.cc.vt.edu). 
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3 Exciting 
New Videos 
Packed With 
Inside Info 
And Hot Tips! 



VIDEO TOASTER 



The Video Toaster™ from NewTck is hailed as the world's first video 
computer card enabling broadcast-quality production on desktop! TheVlDEO 
TOASTER" videotape is indispensable for Amiga owners considering the 
purchase of a Toaster™ or those curious about all the excitement over this 
"revolutionary breakthrough in technology." 

VIDEO TOASTER" provides in-depth, comprehensive information on the 
Toaster's" wide array of features and amazing capabilities. Topics covered 
include installing the Toaster" in the Amiga 2000; adding and testing other 
essential equipment; selecting source material; and manipulation of the many 
digital video effects, including flips, tumbles, mirrors, spins, splits and titles. 
This video also illustrates how to generate and then superimpose letters over 
pictures, how to produce three-dimensional animations and how to paint on 
video images. 

Sec for yourself what the excitement is all about! 



HOT ROD! 



HOT ROD YOUR AMIGA provides authoritative advice on how to achieve 
maximum power with your machine, whether you own a series 500, 2000 or 
3000 Amiga, 

HOT ROD YOUR AMIGA teaches you how to expand memory internally 
and externally. It provides valuable, in-depth information on selecting and 
installing hard drives, memory boards and accelerators; back-up software and 
utilities; RAM and drive space differences; and other "hot-rodding" tips. It also 
covers high-end peripherals such as DCTV" and the revolutionary Video 
Toaster™ . Don't wait to soup up your Amiga! 



PRIMER 



THE AMIGA PRIMER video provides step-by-step instructions covering the 
many features of the Amiga. Whether you're a new owner or an experienced 
user, this easy-to-follow video will prove invaluable. Packed with almost 90 
minutes of detailed information, THE AMIGA PRIMER teaches you in an 
entertaining format with vibrant graphics and upbeat music. 

Gain the full benefits that the Amiga has to offer on all Amiga models, 
System 2.0 and AmigaVision®. It also covers the Amiga workbench, the CLI, 
peripherals and utilities. There's no easier way to master your Amiga! 



HURRY 

WHILE 

SUPPLIES 

LAST! 





1-800-343-0728 

CALL TOLL FREE or mail this coupon. 



■^ "T a \ /~-j 1 1 am eager to become an 
Af |H ^^ J expert! Please send me the 
■L I 4\J» following videos: 

Q Video Toaster™ to4.95/ $19.95 

□ Hot Rod Your Amiga 524.% $19.95 

□ The Amiga Primer $29.95 $24.95 

Q Animation Video, Vol. One ...$1W95 $14.95 

□ Desktop V]deo, Vol. One $2jM?5 $24.95 

□ Amiga Graphics, Vol . One . . . .$29.95 $24. 95 

□ The Musical Amiga W9.95 $24.95 

□ Animation Video, Vol, Two.. .$24.95, $19.95 

□ Check/Money Order □ MasterCard QVISA □ Amlix 

■ Make checks payable to TechMedia Video. Q Discover 
I Please include $2.95 shipping & handling for one video, $5.00 for two or more. 
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■ Name 
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Programming Serial.Device 



By Robert Wittner 



PROGRAMMERS WHO NEED better control over serial 
communications than SER: allows or need a reliable way to 
perform serial input and output (I/O) can use the Amiga's 
serial.device directly. As with most other Amiga devices, the 
serial device uses Exec's basic I/O functions and builds upon 
Exec's data structures. Knowing how to program the serial 
device will give you control over all serial parameters, let 
you perform extensive error-checking and asynchronous 
I/O, plus allow you to handle multiport serial boards effec- 
tively . If you are not comfortable with the basics of Exec I / O, 
consult Chapter 19 of the Amiga ROM Kernel Manual: Libraries 
& Devices before you begin. 

INITIAL STEPS 

There are a few steps that must be completed before any 
serial communication can take place. First, you must create 
an Exec message port to receive notifications from the seri- 
al device. To do so, you use the CreatePort() function, which 
accepts two arguments and returns a pointer to a newly 
created message port (or NULL if the call fails). The first ar- 
gument is a pointer to the port's name or NULL if the port 
is to be anonymous. The ports that are used in conjunction 
with the serial device should be anonymous, because no 
other process will be looking for them and assigning names 
to the ports only causes needless system overhead. The sec- 
ond argument is the port's priority, which applies only to 
named ports and will not be used in the following examples. 
CreatePortQ's complement is DeletePortQ, which accepts 
one argument: a pointer to a port to be removed. You cre- 
ate a port as follows: 

struct MsgPort 'MyPort; 

It (l(MyPort = (struct MsgPort *)CreatePort(NULL, 0))) 

I 

f CreatePort failed, place an error handler herel V 

} 

Next, you create and initialize the request structure that 
you will use to issue commands and pass parameters to and 
from the serial device. The IOExtSer structure serves the pur- 
pose. An extension of the Exec IOStdReq structure, the 
IOExtSer structure is defined as: 

struct IOExtSer 

{ 

struct IOStdReq lOSer; 
ULONG lo CtlChar; 

ULONG io RBufLen; 



ULONG 


lo^ExtFlags; 


ULONG 


io Baud; 


ULONG 


!o_BrkTlme; 


struct lOTArray io TermArray; 


UBYTE 


io ReadLen; 


UBYTE 


io WriteLen; 


UBYTE 


io SlopBits; 


UBYTE 


io_SerFlags; 


UWORD 


io Status; 



); 



The structure members will be defined as they become rel- 
evant. To initialize this structure, use CreateExtIO(), which ac- 
cepts two arguments and returns a pointer to an initialized ex- 
tended I/O request block. This I/O request block is the 
IOExtSer structure in the example. The first argument in Cre- 
ateExtIO() is a pointer to an initialized message port; the sec- 
ond is an integer indicating the size (in bytes) of the extended 
I/O request block. The complement to the CreateExtlOQ func- 
tion is DeleteExtlOQ. This function accepts a pointer to a pre- 
viously created IOExtSer and disposes of it. The following C 
code fragment creates and initializes an IOExtSer structure: 

struct IOExtSer 'MylOExtSer; 

it (l(MylOExtSer = (struct IOExtSer ")CreateExtlO(MyPort,sizeof(struct 
IOExtSer)))) 



( 



I* CreateExtIO failed, place an error handler here! "I 

) 

The next function you need is OpenDevice(), which ac- 
cepts four arguments and returns a boolean value indicat- 
ing success or failure. The first argument is a pointer to the 
name of the device to be opened. The second is the unit 
number of the device to which you are requesting access. 
The third is a pointer to your initialized IOExtSer structure 
created with the call to CreateExtlOQ. The fourth is a bit- 
mask (flags) and is not used with the standard serial.device. 
However, some devices that are serial. device-compatible, 
such as drivers for multiport serial boards, may use this ar- 
gument. Check the documentation that comes with these 
peripherals for details. 

The complement to OpenDevice(), the CloseDeviceQ func- 
tion, accepts one argument: the pointer to the IOExtSer struc- 
ture you used as the third argument in the OpenDeviceQ call. 

The structure member MyIOExtSer->io_SerFlags contains 
a bit labeled SERB_SHARED. If this bit is set when you call 
OpenDeviceQ, it allows other programs to access the same 
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port that your program is using. This could be exactly what 
you want, or it could spell disaster. If you want to ensure that 
your program has exclusive use of a serial port, you should 
make sure that this bit is not set. To do this in our example, 
you can perform the following assignment: 

MylOExtSer->lo_SerFlags &= -SERF_SHARED; 

Note the use of the bitmask SERF_SHARED instead of the 
bit number SERB_SHARED. 

If you plan to use the examples given below, you must in- 
clude some C header files in your code. You may also want 
to examine the data structures and constant declarations in 
these files for educational purposes. The obvious file to in- 
clude is devices /serial.h, which contains all the device-spe- 
cific data structures and constant definitions. In its current 
form, devices/serial. h automatically includes exec/ 
io.h, which contains data structures and constants for device- 
independent I/O (CMD_READ, CMD_WRITE, and others 
mentioned below). Another possibility is exec/memory .h, if 
you plan to use Exec's memory-allocation capabilities. Final- 
ly, most programs make use of exec/types.h, which contains 
definitions for such items as UBYTE, UWORD, and other 
common type-defined data types. 

With all this in place, you can now "open" the serial device. 
Note that the string "serial.device" in the example below can 
be replaced with the driver of any serial. device-compatible 
device. Also, you can change the unit number to indicate 
which port on a multiport board you want to use. Keeping 
with the previous examples, the following code allocates the 
built-in serial port for use by your process: 

#deflne DEV_NAME "serial.device" 
.'(define UNIT NUM 

if (!(OpenDevlce(DEV_NAME, UNIT_NUM, MylOExtSer, 0))) 
{ 

/* The open failed. Reasons: 

1 . The named device and unit do not exist 

2. The device Is being used exclusively by another program 

3. We couldn't secure exclusive use ot this port 
V 

1 

If this last step is completed successfully, the requested se- 
rial port is available for use. If you encounter an error, you 
must remember to dispose of what you allocated by using the 
functions CloseDeviceQ, DeleteExtIO(), and DeletePortQ in 
the order listed here. You must perform a CloseDevice() on 
your allocated port, especially if you were granted exclusive 
access. If you fail to relinquish control of a port upon program 



termination, no other programs will be able to use that port 
until the Amiga is reset. 

DATA TRANSFER 

At this point, you can begin transmitting and receiving 
data. You have a choice of two methods. The first is syn- 
chronous I/O, in which you issue a command to the device 
and wait for its completion. The DoIO() function issues the 
command to the device and then waits for the command to 
complete before returning control to your process. 

You have to fill in some of the IOExtSer structure members 
with data, however, before calling DoIOQ. For this example, 
the IOSer.io_Data field must be filled in with a pointer to the 
data to be sent. For receiving data, this would point to a 
buffer area for storage of incoming data. Next, the IOSer.io_ 
Length field must be assigned a number indicating how 
many characters are to be written (or read). Following that, 
the IOSer.io_Command field must contain the command to 
be executed, as defined in devices/serial.h. Certain com- 
mands, such as SDCMD.SETP ARAMS (described below), 
may require that the IOSer.io_Flags field contain valid data. 

The following code writes a string to the serial port syn- 
chronously: 

MylOExtSer->IOSer.lo_Data = (APTR)"Only Amiga makes It possible!"; 
MylOExtSer->IOSer.lo_Length = 29; 
MylOIxtSer->IOSer.lo_Command = CMD_WRITE; 
DolO(MylOExtSer); 

Upon return, MyIOExtSer->IOSer.io_Actual will contain 
the actual number of characters written to (or read from) the 
device. Additionally, MyIQExrSer->IOSer.io Jirror will con- 
tain the error code if an error occurred during the execution 
of the most recent command. 

This first approach has one critical drawback. The function 
DoIO() does not return control to the calling process until the 
I/O is complete or an error occurs. If you issue a read com- 
mand to the device and no data ever arrives at the port, your 
process is effectively locked out and will never regain con- 
trol of the CPU. This inherent problem eliminates DoIO() as 
an effective means of I/O for a great number of applications, 
including terminal programs and bulletin board systems. 

The solution is the second method of data transfer: asyn- 
chronous I/O. Asynchronous I/O involves the discrete func- 
tions normally called by DoIO()— namely SendIO(), Check- 
IO(), and WaitlOf). 

First, you use SendlOQ to issue the command to the device. 
SendlOQ is similar to DoIO() (it takes the same parameters) 
but does not wait for the I/O operation to complete. You can *■ 
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then use CheckIO(), which is also similar to DoIO() but re- 
turns a value indicating the status of the requested command 
(sent by SendIO()). Finally, WaitlOQ lets you wait for com- 
pletion (if your request is not already complete), reply to the 
"completion" message at the port, and perform some clean- 
up operations. 

You must be careful not to write code that calls CheckIO() 
in a loop to determine whether an I/O request has complet- 
ed. This technique is known as "busy-waiting" and degrades 
performance across the entire system by stealing CPU time 
away from other running processes. Instead, you should use 
the Exec WaitQ function, which wakes up the task when the 
request has been completed. Here's an example of a system- 
friendly, asynchronous read: 



MylOExtSer->IOSer.io_Data = (APTR)MyBuffer; 
MylOExtSer->IOSer.io Length ■ 1 ; 
MylOExtSer->IOSer.io_Command = CMD_READ; 

SendlO(MylOExtSer); 

/" Other processing can occur here... '/ 

/* Now we wait for the request to complete */ 
Wait(1 « MyPort->mp_SigBit); 



"The ExecWaitO 

function wakes up 

a task when an 

I/O request 



I* Now get rid of the reply and dean up V 
WaitlO(MylOExtSer); 

MyIOExt5er->IOSer.io_Actual contains 
the number of characters read upon com- 
pletion, MyIOExtSer->IOSer.io_Error contains any error 
codes, and MyBuffer contains the character that was read. 

Most real-world programs have several ports that receive 
messages on a regular basis. To wait for any port to receive 
a message, you logically-OR the signal bits together from all 
of the involved ports and WaitQ for any one (or more) of the 
aggregate signals to become active. Here's an example with 
an IDCMP message port attached to an Intuition window: 

MyActlveSignals = Wait((1 « MyPort->mp_SlgBI1) I (1 « MyWIndow- 
>UserPort->mp_SigBit)); 

if (MyActiveSignafs & (1 « MyPort->mp_SigBlt)) 

I 

/" Our serial request has completed */ 

J 

if (MyActlveSignals & (1 « MyWlndow->UserPort->-rnp_SigBlt)) 

{ 

r There may be an IntulMessage for us V 

1 

It can be frustrating to attempt to read one character at a 
time using this method. Depending on the speed of your ma- 
chine and efficiency of your code (compiler), reading one 
character at a time limits your throughput to about 1200- 
2400 baud. 

To have vour program keep up with 2400+ baud through- 
put, you must adopt a different input technique, one that in- 
volves a new command, called SDCMD_QUERY. The com- 
mand should be issued synchronously (with DoIOQ), because 
it always returns immediately. Upon return, MylOExtSer- 
>IOSer.io_Actual contains the number of characters waiting 
in the buffer, and MyIOExtSer->io_Status contains several 
bits (defined in devices/serial.h) that indicate the current sta- 



tus of your allocated port. The method's basic steps are: 

1. Set up a CMD_READ request for one character and issue 
the request. 

2. When the request completes, one or more characters have 
been received; the first character is already in MyBuffer. 

3. At this point, check MyIOExtSer->IOSer.io_Error to see if 
a break has been sent or if an error has occurred. 

4. Next, issue a SDCMD_QUERY request. The request should 
be issued synchronously. 

5. MyIOExtSer->IOSer.io_Status will contain status bits indi- 
cating the current state of the serial device. This could be 
checked for carrier loss, ring indication, and so on. 

6. MylOExtSer->IOSer.io_Actual will con- 
tain the number of unread characters in 
the buffer. Now, issue a CMD_READ 
request for the MylOExtSer->IOSer.io_Ac- 
tual number of characters. This request 
should also be issued synchronously, be- 
cause the characters are waiting in the read 
buffer. 

7. Process these characters as the applica- 
tion dictates. 

8. Go to step 1. 

Here is the sample code: 



is completed. " 

V Note: Assumes OpenDevice() call has 
* completed successfully. 
' Processlnput() is a dummy function which handles input '/ 

char 'MyBuffer 

UBYTE Done = 0; 

r Allocate a buffer of the same size as the serial device's 

* internal buffer plus one just in case one extra character 

* arrives after we tuck away the initial character */ 
MyBuffer = (char ')AllocMem(MylOExtSer->io_RBufLen + 1, 0L); 
if(!MyBuffer) 

{ 

f Couldn't allocate memory, handle the error! */ 

} 

while (I Done) 

{ 

MylOExtSer->IOSer.io_Data = (APTR)MyBuffer; 
MylOExtSer->IOSer.io_Length = 1; 
MylOExtSer->IOSer.lo_Command = CMD_READ; 
SendlO(MylOExtSer); 

r Now we wait for the request to complete */ 
Wait(1 « MyPort->mp SigBit); 



I" The character is in MyBuffer 7 

MylOExtSer->IOSer.io_Command 
DolO(MylOExtSer); 



: SDCMD_QUERY; 



if (MylOExtSer->IOSer.io_Actual) 

{ 

/* There are characters In the buffer */ 

f Store them right after the character we just received */ 
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MylOExtSer->IOSer.io Data = (APTR)(MyBuffer + 1); 
MylOExtSer->IOSer.io_Length = MylOExtSer->IOSer.io_ 

Actual; 
MylOExtSer->IOSer.io_Command = CMD_READ; 
DolO(MylOExtSer); 

Done = Processlnput(MyBuffer,MylOExtSer->IOSer.io_ 
Actual + 1); 



) 



} 



I' Return our buffer memory when finished V 
FreeMem(MyBuffer, MylOExtSer->io_RBufLen); 

CHANGING PARAMETERS 

With a few exceptions, you can change most of the serial 
port's parameters at any time with the SDCMD_SET- 
P ARAMS command. This command modifies the operation 
of your port according to the values you place in the follow- 
ing variables before issuing the command: 

MylOExtSer ->io_CtlChar Control characters for XON/XOFF 

MylOExtSer ->lo_RBufLen Size of the device's input buffer 

'" Note: The buffer Is re-allocated when you change this value, so all 

data In the buffer is LOSTI 

MylOExtSer ->io_ExtFlags Controls MARK/SPACE parity 

MylOExtSer ->io_Baud Baud rate for reads and writes 

MylOExtSer ->io_BrkTime Break duration in microseconds 

MylOExtSer ->io_TermArray Array of termination characters — See 

the ROM Kernel manual about this and EOFMODE. 

MylOExtSer ->io_ReadLen Bits per character (read) 

MylOExtSer ->io._ WriteLen Bits per character (write) 

MylOExtSer ->io_StopBits Number of stop bits 

MylOExtSer ->io_SerFlags Can change any flag except SERF_ 

SHARED, SERF XDISABLED, and SERF7WIRE. These must be set/ 

reset before the OpenDeviceQ call is made. 

Here's an example: 

/• Change to 2400 baud V 
MylOExtSer->io_Baud = 2400; 
MylOExtSer->IOSer.Jo_Command = SDCMD„SETPARAMS; 

DolO(MylOExtSer); 

FAST I/O 

To obtain the highest possible throughput, you should call 
another library routine that was designed with speed in 
mind — BeginIO(). You can treat BeginlOQ almost the same 
as SendlOQ with one important exception: the IOB_QUICK 
bit in MyIOExtSer->IOSer.io_Flags. To exploit this difference, 
turn on the bit IQBJ2UICK in MyIOExtSer->IOSer.io_Flags 
and call BeginlOQ. When BeginlOQ returns to you, check the 
status of the IOB_QUICK bit. If the bit is still set, the request 
has already completed, without the need for a reply mes- 
sage, a possible Wait(), or a task switch. This technique is 
demonstrated below: 

MylOExtSer->IOSer.io Data = (APTR)MyBuffer; 
MylOExtSer->IOSer.io_Length = 1 ; 
MyiOExtSer->IOSer,io_Command = CMD READ; 
MylOExtSer->IOSer.io_Flags l= IOF_QUICK; 

BeginlO(MylOExtSer); 

f Test to see if the request has already been completed */ 
if (MylOExtSer->IOSer.io_Flags & lOF_QUICK) 



I' Request has completed */ 

I* The character is In My Buffer 'I 

!• No WaitlO() Is necessary V 



) 
else 

I 



/* Not Quick I/O so we wait for the request to complete 7 
Walt(1 « MyPort->mp_SlgBit); 



f Now get rid of the reply and clean up */ 

Wa it I0( MylOExtSer): 



I 



SIMULTANEOUS REQUESTS 

Applications such as tenriinal programs need to have a read 
request pending while being able to write to the serial device. 
You can accomplish this by setting up two message ports at- 
tached to the same device-request structure: Create a mes- 
sage port and an IOExtSer structure, then open the device as 
in the example given earlier. When this is completed, create a 
second message port and a second IOExtSer structure. 

You then copy the first IOExtSer structure to the second 
IOExtSer, but change the mn_ReplyPort of the second struc- 
ture to point to the second message port. For example: 

struct MsgPort "MyPortl, - MyPort2; 
struct IOExtSer 'MylOl, ^102; 

if ((!(MyPort1 = (struct MsgPort *)CreatePort(NULL, 0))) 
II (I(My Port2 = (struct MsgPort *)CreatePort(NULL, 0)))) 
{ 

f Couldn't create one or both ports */ 
} 

if «l(Myl01 = (struct IOExtSer *) 

CreateExtlO(MyPort1, slzeoffstruct IOExtSer)))) 
II (!(Myl02 = (struct IOExtSer *) 

CreateExtlO(MyPort2, slzeof(struct IOExtSer))))) 
{ 

f Couldn't create one or both IOExtSer structures */ 

} 
CopyMem(Myl01, Myl02, sizeof(struct IOExtSer)); 
Myl02->IOSer.io_Message.mn_ReplyPort = MyPort2; 
f We can now use Myl01/Myl02 Independently */ 

ON DISK 

The source code fragments contained in this article are 
available on the accompanying disk in the Wittner drawer, I 
have also included the C source code and executable file for 
a small, generic terminal program that demonstrates the tech- 
niques shown in this article. Study them and you will soon 
be ready for your own projects. ■ 

Robert Wittner, a Programmer Analyst for Rockiuell Interna- 
tional, is working on the space station Freedom program and has 
been programming the Amiga four years. Contact him c/o The 
AmigaWorld Tech Journal, 80 Elm St., Peterborough, NH 
03458. 
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Hard Disks: 
How Fast Are They Really? 



By Michael Sinz 



500 K/ second... 34 files /second. ..this drive. ..that con- 
troller . . . DMA . . . nonDMA . . . multitasking friendly . . . video 
speed . . . millisecond access times . . . SCSI . . . ST-506 . . . AT . . . 
IDE... Adaptec... OMTT... 

The amount of confusing, conflicting, and just plain wrong 
information about hard drives is extreme. Maybe the reason is 
that the Amiga used to have slow hard drives or that the Ami- 
ga now has some of the fastest hard drives in the industry. 
Much of it is because of a misunderstanding about what the 
various terms and numbers mean, Before I can explain these 
terms and how the numbers relate to how fast the system real- 
ly is, you should understand the basic technical issues involved 
in what a hard disk drive 
does — beyond simply stor- 
ing data. 

As you know, data with- 
in a computer is just a se- 
ries of Is and 0s. To store 
this data, the computer 
must, in some way, be able 
to record the Is and 0s so 
that they can be read back 
as the same pattern that 
was written. One of the 
most popular methods is 
magnetic recording. In 
much the same way as au- 
dio tape records and plays 
sounds, the computer gen- 
erates a signal, or sound, 
records it, and then de- 
codes it before playing it 
back when the information 
is requested. Computers 
have done this on magnet- 
ic tape, magnetic drums, magnetic-plated media, spinning 
magnetic tape (which became the floppy), and sealed mag- 
netic-plated media. This has always been one of the most 
complex and fastest advancing fields of computer technolo- 
gy. Not much more than ten years ago, sealed-media hard- 
disk drives (known as Winchesters) were putting a whopping 
5 to 10 million bytes on 8-inch disks. Today, small 3'/2-inch 
drives can store over 1,000 million bytes. 

Disk drives — from floppies to hard disks to Winchesters — 
work much the same at the physical level. A disk drive con- 
tains one or more round, flat discs that are coated with a mag- 
netic particulate substance, usually some form of metal oxide. 
This magnetic coating stores the signal from the computer. 
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Figure I: The bask arrangement of a hard disk. 



To read and write this data, a magnetic pickup and record- 
ing module, known as the read /write head passes over the 
sections of the disk. Disks are divided into parts known as 
tracks. Tracks, while sounding and acting like those of an au- 
dio record, are actually fully contained loops much like those 
of a race track. Each track has a number of data units — called- 
blocks or sectors — stored in a specific order. The order may not 
be as simple as 1, 2, 3; however, the drive (or the controller) 
knows and understands it. The disk in the drive rotates under 
the read/write head, allowing the data to pass under the read/ 
write head for interpretation. The speed of the rotation depends 
on the drive and, sometimes, even on the position of the 

read/write head. 

To pick up requested 
data, the read/write 
head must be positioned 
over the appropriate 
track. This is accom- 
plished by an actuator 
arm or head stepping 
motor assembly. Each of 
the several methods of 
implementing head-po- 
sitioning hardware has 
trade-offs. Some are 
faster at moving the 
«> head, some are more ac- 

curate, and some are 
much cheaper. Once the 
read /write head is on 
the right track, it has to 
wait for the correct data 
to come around for it to 
read. Because a specific 
block exists in a specific 
location on the disk, the drive must make sure that the 
read /write head is over that section of the disk during the 
read or write operation. 

There are a number of minor differences in how these de- 
tails are handled by the drive and controller. The general is- 
sues, however, are the same on all of the drive systems, be 
they smart SCSI drives or simple ST-506 drives. (See Figure 
1.) With the basics out of the way, let's explore the impact of 
the above factors and what they mean to disk performance. 

HOW DOES THIS RELATE TO SPEED? 

Drive spec-sheets often contain information that either is 
unimportant or insignificant. If you read between the lines, 
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Find out what the specs really mean and how 
to test your drive's performance. 



however, you can find valuable information. One of the num- 
bers that is thrown about the most is the average seek Cor ac- 
cess) time. This is the amount of time it takes the drive to 
move from its current cylinder location to the next request- 
ed cylinder. (A cylinder is a group of tracks that all have the 
have the same number on the various discs . For example, all 
the track Os make up one cylinder.) This number is bally- 
hooed by many drive manufactures (and users) as each tries 
to one-up the other. While this does not tell you the drive's 
speed, it gives a relatively good first guess. Current drives 
run in the 15 to 25 millisecond range. For example, the aver- 
age seek time for a Quantum ProDrive is 19ms. 

Another value found on the spec-sheets is the data transfer 
rate. While this number may seem to be very important, few 
drive specifications talk about the overall data transfer rate. 
They usually boast a value that is the electrical transfer rate 
of the drive to the controller interface. For example, Quantum 
boasts a two-megabyte-per-second data transfer rate in 
ASYNC SCSI. This is, however, somewhat misleading. While 
the drive electronics could send data to the controller board 
at that speed, the data does not come from the disk at that 
speed. In fact, the drive data transfer speed to the physical 
disk may be much slower; this rate is a combination of the 
speed of the local drive electronics and that of rotation of the 
disk. The fastest a drive could read a track of data would be 
in one revolution of the disk. Therefore, if the disk revolves 
at 2400 RPM, the best speed you could expect would be 40 
complete tracks in one second. (Assuming no delay between 
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SiMple 8-block track 
with l:i Interleave 

Heads all blocks in 
order in 1 rotation 



SiMple 8-block track 
with 2:1 Interleave 

Reads all blocks in 
order in 2 rotations 



Figure 2: Three examples of interleave schemes. 



reading one track and the next.) So, a 2400 RPM drive with 
32 blocks/track would get 1280 blocks/second, which trans- 
lates to 640 K/second. If the drive specifications give the ro- 
tation speed of the drive and the number of blocks on a track, 
you can calculate the absolute maximum transfer speed. 

Related to the transfer speed is the interleave that the drive 
can run on. Interleave is a trick to increase the transfer rate if 
the drive electronics are slow— or the host computer is. The 
simple case of a 1:1 interleave means that the blocks are on 
the disk in order. That is, block 1 is followed by block 2, and 
so on for the whole track. This also means that a complete 
track can be read in one revolution of the disk. If the time be- 
tween reading one block and reading the next is too long, 
however, the next block may have already passed by the 
read/write head. If this happens, the drive will have to wait 
until (he block comes around again before it can read it. With 
a 2:1 interleave, the drive has the blocks arranged on the disk 
as every other one. (See Figure 2.) Thus the computer or con- 
troller of the drive has more time between the blocks to get 
ready for the next one. It does, however, reduce the transfer 
speed by making the drive do more revolutions to read the 
track. The fact that a drive/controller combination specifies 
that it can run at a 1:1 interleave means that the electronics 
are fast enough to read the entire track in one revolution. 

The other main "selling point" is the interface that the drive 
uses. There are many types of interfaces available, each with 
its own advantage. In general, the smarter interfaces are bet- 
ter as they remove much of the grunt work from the soft- 
ware/controller end 
of the equation. The 
de facto standard on 
the Amiga has be- 
come SCSI (Small 
Computer Systems 
Interface), defined by 
ANSI. It places a 
large amount of intel- 
ligence on the drive 
and thus makes it 
much easier to plug- 
and-play; just take a 
SCSI-standard drive 
and connect it to a 
SCSI controller and 
they will work to- 
gether. SCSI is also 
rather fast. At two 
megabytes-per-sec- ' 




SiMple 8-block track 
with 3:i Interleave 

Peads all blocks in 
order in 3 rotations 
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ond in ASYNC transfers and over four MB/second in SYNC 
transfer mode, it currently is much faster than the physical 
limits of the drives. SCSI-2, the new specification, goes even 
faster. (For a complete discussion of the standard, see "Inside 
SCSI," p.17, Aug./Sept. '91.) 

At the other end of the drive interface spectrum is ST-506, 
which is about as minimal as you can get. The controller 
software needs to know everything about the drive, in- 
cluding physical layout, write-precomp timings, bad-block 
mapping methods, and so on. Because this means a reduced 
amount of electronics on the drive, they were cheap and 
were easier to make. Today, however, the price difference 
has mostly disappeared. Other drive interface standards in- 
clude ESDI (enhanced ST-506. with less versitility but po- 
tentially more speed than SCSI) and IDE (actually, IDE-AT 
and IDE-PC; reasonable intelligence and very simple to 
make a controller for). Other interfaces exist but are either 
very costly or rare. 

TO DMA OR TO NOT DMA? 

Actually, that question is rather simple to answer once you 
know exactly what the difference is. DMA stands for Direct 
Memory Access. As the name indicates, a DMA device has 
direct access to the memory system of the computer, mean- 
ing the device can actively read or write data to memory 
without the CPU. 

Figure 3 shows a very over-simplified diagram of how a 
computer (the Amiga in this case) is connected. As you can 
see, each part of the system is connected to the main pipeline 
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of data: the Zorro bus. (Zorro is the Amiga bus architecture.) 
As the small diagrams below show, a DMA transfer, once 
started, goes directly from the hard-disk controller board to 
memory — basically only one step. The CPU method, howev- 
er, transfers data from the controller board to the CPU, and 
then out to the memory, thus taking two bus transactions. 

If it were all that easy, the clear winner would be DMA. 
However, there are complications: When doing DMA, the 
controller must be told where to send the data. This takes a 
small amount of time, and does not need to be done in the 
CPU method. In addition, when the DMA is complete, an in- 
terrupt is generated to tell the system. This takes a few cycles 
to respond to. On the other hand, there are also some good 
points: First, when doing DMA, the CPU is free to use as 
many available bus cycles as it wants. Thus, in a multitask- 
ing system, you can continue to run other tasks while data is 
being transferred from the disk. The CPU method keeps the 
CPU very busy doing all of the work and thus leaves less 
CPU available for other tasks. Second, because the DMA 
method uses less bus bandwidth, it lets other devices that 
may be bandwidth-hungry work better. 

Just because a controller is designed for DMA or CPU 
transfer, you cannot call it bad or good. Good designs, as 
well as bad, are possible in either format. Given two equally 
good implementations, however, the DMA controller will 
have overall better performance, 

MEASURING PERFORMANCE 

Measuring the performance of a disk subsystem is a rather 
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<I> Hot all controllers are 
SCSI based. However, Hast 
of then are. Fop this 
diagram, SCSI represents the 
connection between the disk 
drive and the controller. 

<2> The BUSTEB in the flruga is 
the bus controller chip. It 
is responsible for keeping 
order- on the Zorro bus. 
For this diagran, it is drawn 
as being between the bus and 
the CPU. 

(3) For the CPU case, it does not 
show the fact that the CPU Must 
also be loading instructions 
and thus have yet another bus 
cycle involved. 
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interesting science. In addition to the physical limitations of 
the drive and controller, there are issues of software tech- 
nology at the drive-controller, file-system, and operating-sys- 
tem levels. In addition, many of the standard testing issues 
come into play, such as accuracy of the test, accuracy of the 
observation, applicability of the test, and so on. 

The accuracy of the test can be defined rather exactly. On 
the Amiga, the system has a timer that has a to '/« second 
( /=o in PAL) resolution. This comes out to roughly 0.02 sec- 
onds. Thus, any given time reading will be accurate only 
within +/- 0.02 seconds. To test the speed of the tests, the 
time must be read at the beginning and end of the test. This 
results in +/- 0.04 seconds of accuracy. Thus, to make the test 
have a +/- 1% accuracy, it would have to run for a minimum 
of four seconds. 

The accuracy of the observation is much more difficult to 
quantify. The issue here is that in doing the observation, the 
test and thus the results are affected. The best that can be done 
is to try to minimize the effect of observing the test while not 
compromising the quality of the observations. 

What the last issue — the applicability of the test — really 
means is how well the test (and its results) relates to the real- 
use performance of the drive. In many ways, this is more im- 
portant than the other two issues, as without reasonable ap- 
plicability the test results would be useless. 

With DiskSpeed, the disk-performance test software that 
MKSoft Development is developing, attention has been paid 
to make the tests both accurate and realistic. DiskSpeed 3.1 
has proven accurate and has become the standard by which 
Amiga hard disks and controllers are judged. With 
DiskSpeed 4.0 {currently under development) a whole new 
set of tests will be possible. 

DISKSPEED: THE AMIGA STANDARD 

I first developed DiskSpeed because other disk-drive per- 
formance testers were either highly inaccurate or did not re- 
late well to real-world disk drive usage. The accuracy issues 
are easy to solve; however, the applicability issues took some 
thinking. 

In DiskSpeed 3.1, the accuracy issues were solved by mak- 
ing the tests take a long time, ensuring that the clock's accu- 
racy did not adversely affect the results of the test. In addi- 
tion, the tests were done with as clean a software design as 
possible. 

With DiskSpeed 4.0, 1 developed a new technology that can 
automatically size the test time to give as accurate a result as 
possible. It was important that this be done only in the ap- 
propriate tests, as some tests radically change their results if 
they are run for more iterations. 

The more important, and more difficult, part of designing 
a set of tests is coming up with ones that will show results 
that apply to the real world. In that direction, none of the tests 
use anything other than standard AmigaDOS file I/O calls. 
Some people ask me to add a test that performs direct device 
I/O. However, no application would do direct device I/O to 
open, read, write, close, or delete a file. It would not only be 
ridiculous, but the amount of work required to write a file 
system is well beyond what an application developer needs 
to spend time on, 

Now that the tests are to perform AmigaDOS I/O only, 
what needs to be tested? This is where you need some knowl- 
edge of the physical limitations of the disk drives and how 
application software works. As you already know, the Ami- 



ga's filing system is very powerful and flexible. Much of this 
power is from the way data is laid out on the disk. This lay- 
out, however, makes some operations a bit slower, most no- 
ticeably listing a directory. I isting a directory make? the sys- 
tem read many blocks of data often, from different areas on 
the disk, and most applications and all users run into this per- 
formance issue during everyday use. Therefore, a test that 
would measure the performance of the drive/ controller com- 
bination when scanning a directory would provide numbers 
that directly relate to user experience. 

In addition to scanning directories, it is important to be able 
to create new directory entries, find entries, and delete them. 
Again, these are situations that users run into every time they 
use an application that does anything with a disk. All to- 
gether, these tests are designed to show the performance of 
the file system's directory structure. Note that to make these 
tests fair, the number of files created in the test directory is 
always the same. The speed of access in a directory structure 
changes as the number of files change, and, if this test were 
to auto-size itself based on the speed of the device, the results 
would no longer be valid. 

Another test that helps show the performance of the file 
system and device driver is the Seek/Read test. It helps show 
how well a database application may run, as database oper- 
ations tend to be very disk-bound and to access various lo- 
cations with a large file. The Seek/Read test reads small 
chunks from various locations within the file. The speed with 
which the file system can find the correct data location with- 
in a file and then read a part of it is directly measured by this 
test. (Note that the DiskSpeed 3.1 Seek/Read test was rather 
simplistic and produced uninteresting numbers.) 

The final three tests are basic file data read and write tests: 

File Write/Create: Creates a new file and fills in the data. 
The speed of this depends on how fast the filing system can 
locate new empty blocks of disk space for the file. 

File Write: Writes to an old file. The performance here is 
determined by how well the filing system deals with rewrit- 
ing the data in a file that already exists. This will usually be 
faster than the Write/Create test. 

File Read: Reads from an old file. The performance here is 
determined by how quickly the filing system finds the data- 
blocks of a file. 

With DiskSpeed 3.1, each of these three tests were done 
with various buffer sizes, ranging from 512 bytes to 262144 
bytes. DiskSpeed 4.0 adds a few twists — each test will also 
happen on LONGWORD-aligned buffers, WORD-aligned 
buffers, and BYTE-aligned buffers. Each test is then per- 
formed in fast memory and in chip memory (if you have both 
available). 

Also new for DiskSpeed 4.0 is provision for selecting the 
sizes of these buffers. While the larger-size buffers are nice 
to play with, remember that most older applications use a 
512-byte buffer only. Many newer applications are using 
4096-by te buffers, as the speed improvement by just increas- 
ing the amount of data read in one I/O call is rather signifi- 
cant. (DiskSpeed 3.1 helped show this fact.) 

In addition to the basic tests, DiskSpeed 3.1 lets you turn 
on DMA and CPU stress factors. To show how well the 
drive/controller combination worked in a video environ- » 
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ment, the DMA feature increased the amount of bandwidth 
the video control chips were using. CPU stress was an at- 
tempt to simulate hea\ r ) f workloads in the Amiga's multi- 
tasking environment. 

With DiskSpeed 4.0, the CPU stress test has been removed. 
It turned out to produce results that did not mean much. How- 
ever, to take its place is a CPU-availability value that is re- 
ported for each test. This is a rough calculation of the available 
CPU percentage during the test. This is a very useful number, 
as it will tell if there is enough CPU time available to decom- 
press a picture while loading the next one or to handle user in- 
put during disk I/O. Observing a test always has an impact 
on the results. This is a known fact, and DiskSpeed is not able 
to get around it. In doing the CPU-availability checking, the 
performance of the system may change. This is due to the fact 
that just the act of counting the CPU time will use some CPU 
time and change the dynamics of the system. However, if all 
tests are done the same way, the relative merits of the drives 
under test will still be valid. 

WHY...? 

So, why do the numbers come out the way they do? 

• Why are small transfers so much slower? 

One of the major reasons is the layout of data on the disk. 
As Figure 2 shows, a disk's sectors can be laid out several 
ways. (Most hard drives have much more than eight blocks 
to a track.) Large transfers require the disk drive to send the 
data for a number of blocks. If these were blocks one to eight, 
the drive could read all of them in one revolution of the disk, 
given a 1:1 interleave. If a program asks for only one block 
worth of data at a time, however, the transfer of the first 
block can take so long that the second block will have passed 
by the head before the drive is ready to read it. Therefore the 
disk will have to rotate around until that block is available 
again. In the example, a read of eight blocks transfered one 
at a time will take seven full revolutions after the first block 
is processed — seven times slower than the transfer that asked 
for eight blocks at once. This is worst-case. Many drives to- 
day have some caching and read-ahead capabilities that help 
minimize this. 

• Why are the results inconsistent from one test to another? 
Disk performance testing is a rather complex task. With- 
out special equipment, many things are impossible. When 
DiskSpeed runs, it does not know the exact location of the 
disk relative to the drive heads. As a result, there is a lag be- 
tween the time the drive is asked to read (or write) a block 
and the time that block is under the read /write head. This 
time lag is known as rotational latency. The faster the drive 
spins, the shorter this period is. 

• Why does the CPU test slow the drive speed? 
Depending on the method used to implement the con- 
troller software, the CPU test task, which runs at -127 prior- 
ity, becomes extra overhead. The difference in speed may be 
rather small from the CPU standpoint, but it may be just 
enough to fall prey to the rotational-latency problem. Con- 
sider: When no task is running, waking up a task entails sim- 
ply starting it again. If another task is running at the same 
time, however, the old task must first be put to sleep. This 
work can consume just enough time to make the system miss 
the next block that is coming around and require the system 



to wait until the information cycles past again. 

• Why does drive performance change as the drive gets older? 
Drive performance does not really change because of a 

disk's age. As files are written to the disk and then later re- 
moved, however, the empty areas of the disk become scat- 
tered. When the disk is then tested, the system must seek 
each of the locations where part of the data is stored. This 
adds seek time, rotational latency, control overhead, and pro- 
cessor overhead as the information is handled. 

• Why are writes sometimes faster than reads? 

The way the drive works can have a major impact on this. 
If the drive has a cache, a write can be sent to the cache while 
the drive is still waiting for the required position to reach the 
read/write head. Thus, the disk can signal that the write is 
completed when it is not quite done. During the time the sys- 
tem is getting ready for the next write, the drive will hope- 
fully send the last write to the disk. 

• Which number is most important? 

The answer depends on your application and how you use 
your machine. If you often list directories or create files, you 
should pay attention to the directory-manipulation tests, in- 
cluding files-per-second that are created, opened, scanned, 
and deleted. One of the numbers that is most important to me 
is the small-buffer performance — that is, the performance of 
the drive/controller on buffered reads between 512 bytes and 
4096 bytes. These two sizes are much more representative of 
the size of the read/write buffers of most applications. Artists 
and animators care more about large buffer sizes, because 
high-speed performance for large files is a major factor in 
their work. However, large buffers are useful only if the file 
can be read as one big chunk. 

• Why does the test sometimes show more than 100% avail- 
able CPU? 

Because the CPU availability must be measured to get a 
reading of the total CPU present, the measurement can be in- 
correct by a small amount. The measurement code tries its 
best to get an accurate reading; it does not always succeed. 
Most of the time, however, it will notice when accurate mea- 
surements are not possible and will turn off the CPU testing, 
because the results will be meaningless. 

With the addition of the CPU-availability numbers, a much 
more complete picture of drive and system performance can 
be obtained. As multimedia becomes more important, the 
performance combination of high drive speed and large 
amounts of available CPU power will make it all possible. 

With DiskSpeed 4.0, developers will be able to ensure that 
the designs of their hardware and software live up to the 
performance needs of their users. It will also give the Amiga 
data that proves the performance of the system for real work. 
Applications such as database servers, file servers, and mul- 
timedia programs require as much performance as possible 
in the drive subsystem. The Amiga has the performance to 
outshine most other platforms in this area. ■ 

Michael Sinz, a Systems Software Engineer at Commodore, is re- 
sponsible for many parts of the OS. Write to him c/o The Amiga- 
World Tech Journal, 80 Elm St , Peterborough, NH 03458, or 
contact him on BIX (msinz). 
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quently, all previous software with its own embedded font 
coded in will lose 24 bytes from the system every time it is run. 
There is a function, StripFontQ, that removes the TextFontEx- 
tension, but of course the old software does not use it. 

The moral of that story is: Do not define instances of graphics 
system structures in your code. If a function exists to create a 
structure, such as GfxNewQ or GetColorMapO, then use that; 
otherwise use AlIocMem(), It will make our lives at Com- 
modore easier in the future. 

YOU HAVE THE CON 

We hope you like the final results of 2.0. With its new fea- 
tures and stability, the graphics library has been greatly im- 
proved. In fact, the entire Kickstart is now more stable than 
any other previous release. There are plenty of new graphics 
features to make your program writing easier and make your 
programs more attractive. Use the database, and take note of 
anything marked as private! ■ 

Spencer Shanson is one half of the duo at Commodore tltat is re- 
sponsible for maintaining and extending the Amiga's graphics.li- 
brary. He was previously employed by British software developers, 
including Argonaut Software (to work on Starglider 2) and Buro- 
care (to develop Scanners and TapeStreamers for the Amiga). Con- 
tact him c/o The Amiga World Tech Journal, 80 Elm St., Peter- 
borough, NH 03458, or on BIX (sshanson) or Usenet (spence® 
commodore.com uiinetkbmvaxlspence). 
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use problems to access addresses that can be trapped on by 
Enforcer. 

BUY NOW, SO YOU DON'T PAY LATER 

At a small developer meeting last spring, we at CATS were 
disappointed to discover that although we had convinced 
the majority of the audience that they needed Enforcer, a rel- 
atively small percentage of the developers owned the equip- 
ment necessary to run it — an MMU. If you do not have an 
MMU, get one! The investment in an A3000, 68030 card, or 
68000+ MMU card will quickly pay for itself by cutting down 
on your development time and allowing you to catch and 
find software problems with Enforcer. Enforcer and Mung- 
wall are not just for developers and QA departments. Any- 
one who uses or reviews in-house software or software for 
purchase or contract can benefit his company by catching 
hidden software problems during normal usage and exami- 
nation of the programs. Many people at Commodore run En- 
forcer all of the time, including the Vice President of CATS. 
(Keep that in mind if you are trying to impress him with 
your software!). ■ 

Carolyn Scheppner is Technical Manager of CATS (Commodore 
Applications and Technical Support) U.S. Contact her c/o The 
AmigaWorld Tech Journal, 80 Elm St., Peterborough, NH 
05458, or on BIX (cscheppner) or Usenet ({uunet, rutgersVxbmvaxl 
carolyn). 
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ginlOQ finds CMD_WRTTE or SDCMD_BREAK, the request 
is directed to the write task. The other device commands, 
such as CMD_RESET, are performed "immediately" — with- 
in the context of BeginlOQ. Most of those other routines are 
surrounded by a Forbid()/Permit() pair to avoid unpre- 
dictable interactions with the two associated tasks and reen- 
trancy problems. (Note, however, that semaphores are usu- 
ally a better choice than Forbid()/Permit(), at least for long 
operations.) 

An important lesson I learned from writing the serial driv- 
er is to perform the work whenever possible within Be- 
ginlOQ, rather than shipping it to a task. For example, if a 
CMD_WRTTE request specifies only a one-byte length, and 
the transmit register is empty, then the request should ideal- 
ly be performed within BeginlOQ. The performance increase, 
compared with sending the request to a task, is quite large. 
Likewise, if there are enough bytes in the read buffer to sat- 
isfy' a CMD_READ request, then copy the bytes to the user's 
buffer and return immediately. 

Unlike the RAM disk example, the serial drive fully imple- 
ments all device routines, including Open(), CloseQ, Ex- 
punge(), and AbortlOQ, and so represents a complete exam- 
ple driver. Take a close look; in this case, a few lines of code 
may be worth a thousand words! ■ 

Dan Babcock is an electrical engineering major at Pennsylvania 
State University and an avid assembly programmer. Contact him 
c/o The AmigaWorld Tech Journal, 80 Elm St., Peterborough, 
NH 03458, or on BIX (danbabcock). 



Q.V.C.S 

Quma Version Control System 

QVCS tracks all the changes you make to source code, 
tracks who made the changes, when, and why. Retrieve 
previous versions of your source code. Summarize changes 
made between releases. Delete unwanted revisions. Label 
all modules for a product release. Restrict who can make 
changes. All this and more... 

With QVCS You Can: 

Save a source revision to a QVCS log file. 

Retrieve a revision from a QVCS log file. 

Configure access lists for each file. 

Configure QVCS atlributes separately for each file. 

Protect files from accidental deletion. 

Associate a version string with a QVCS log file revision. 

Remove a version string association from a QVCS log file. 

Lock the most recent revision in a QVCS log file to prevent others 

from modifying the same file. 
Unlock the most recent revision in a QVCS log file. 
Find out which files are locked and by whom. 
Summarize all changes made to a file since a date, since a 

revision, or since a release. 
Delete unneeded revisions from a QVCS log file. 
Keyword expansion: turn it on, or turn it off, 
Compare one file revision to another. 
Use a journal file to record all QVCS actons for a project. 
Use UNIX style file wildcards for all QVCS commands. 
QVCS is easily configured for different development styles. 
Works with both 1.3 and 2.0. 
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Flames, suggestions, and cheers from readers. 
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MORE GAMES 

First off, congratulations on what 
looks like an excellent magazine. The 
disk is great to have; I'm not crazy 
about typing pages of source code, a 
la Amiga Transactor, 

As a C programmer interested in 
writing games, I would like to know if 
your magazine is planning to run 
some articles on programming games. 
I would really like something ( a li- 
brary, if possible) for working with 
Blitter objects, as the system GEL 
routines are simply too slow for doing 
serious work. 

Zoltan Hunt 
Beeton, Ontario 

As follmvups to "Arcade Elements" on 
page 58 of the October '91 issue, several 
game programming and faster graphics 
articles are in the works. Patience will 
reward you, Zoltan. 



A CHALLENGE 

Please pass along my suggestions 
on to the development community. 

One thing that is rather annoying to 
me is that there has never been a stan- 
dard for joysticks for the Amiga. (I 
don't call the Atari standard a very 
good one!) It seems to me that consier- 
ing the Amiga has a two-button 
mouse, a game that lets you choose 
between mouse and joystick usage 
would be handicapped if you used the 
joystick version. Even IBM has a two- 
button joystick, as ancient as it is. Why 
not introduce a two- or three-button 
standard and suggest that all develop- 
ers begin using it? 

My second gripe is about software 
that does not support '020 or '030 
accelerators. When are these people 



going to begin developing with the 
future in mind? I would also like to 
see software that detects if the user 
has a 24-bit graphics card and auto- 
matically enhances the software ac- 
cording to the user's hardware config- 
uration. This has to be the wave of 
things to come. 

Tony Gore 
Charlotte, North Carolina 



MORE RESPECT AND COVERAGE, 
PLEASE 

I work for a major engineering firm 
where there is a conflict over the use 
of C versus FORTRAN. This reflects 
the significant disparagement FOR- 
TRAN has received in the Amiga 
community. At my firm, C was recent- 
ly chosen as the "standard" program- 
ming language, meaning all new pro- 
grams must be written in C. For me, it 
meant unequivocal justification for the 
purchase of a FORTRAN compiler. 
I prefer FORTRAN over C 
because: 

1. Most scientific and engineering 
programs have been, and still are, 
written in FORTRAN. First, because 
the functions (SORT, EXP, LOG, and 
so on) in C are by default double pre- 
cision. Because in FORTRAN they are 
single precision (perfectly sufficient 
for most variables), FORTRAN pro- 
grams run much faster than equiva- 
lent programs in C. Second, working 
and verified code can be readily 
shared among colleagues and easily 
used in new programs. Verification of 
code, required by many clients and all 
government agencies, is an expensive 
process. 

2. FORTRAN compilers are much 
more stable in their standardization. I 
have read too many times that "this 
program was compiled and linked 
using C compiler/linker X and may 
require alteration to compile/link 
using Y." It's bad enough that C com- 
piler/linkers are inconsistent on the 
Amiga platform. What if I want to 
upload a program to another main- 



frame, workstation, or PC? I've had 
very few problems doing this using 
FORTRAN. 

At my firm, the C decision was 
made by the "professional computer 
programmers" (those who know 
everything about structures and oper- 
ating systems and nothing about nu- 
merical analysis and nonlinear regres- 
sion) without consulting those of us 
who use and develop 90% of the pro- 
grams (that entail simple arrays and 
few graphics) for the benefit of clients 
and company profits. 

What got me going on this topic 
(again) was the absence of FORTRAN 
from the list of languages on which 
The AmigaWorld Tech Journal brings 
insight and benefit. Please consider 
publishing FORTRAN articles. 

Jim Marrone 
El Sobrante, California 



LET US KNOW 

What are your suggestions, complaints, 
and hints for the magazine, Amiga develop- 
ers, and your fellow readers? Tell us about 
them by writing to Letters to the Editor, 
The AmigaWorld Tech Journal, 80 Elm 
St., Peterborough, NH 03458, or posting 
messages in the AW.Tech Journal confer- 
ence on BIX. Letters and messages may be 
edited for space and clarity. ■ 
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1CD proudly presents Prima."., die- high performance, 

low eosi hard drive for Amiga' 500 computers. Prima 
blends a large capacity, low power Quantum' 1 ' hard drive 
with the AdlDK™ host adapter for an unbeatable 
combination. 

Prima replaces the internal floppy drive but includes 
Shuffle Board'" to make your external floppy drive 
DFO:. Prima features auto-booting from FastFileSyslem 
partitions, high speed caching, auto-configuring, and 
A-MaxH'" support. Formatted capacities of 52 and 105 
megabytes are currently available. 
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Prima comes complete with instructions, software, and 
all the hardware necessary' for a simple, clean, no-solder 
installation. It does require an A500 with switching 
power supply, 1 megabyte of RAM. and an external 
floppy drive for setup and installation. 

What other products would we include in the "Ultimate 
A500'"? Of course a four megabyte AdRAM'" 540 and 
Flicker Free Video'" with a multi-sync monitor. 
Why settle for less? 
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ICD 



ICD, Incorporated 
1 220 Rock Street 
RockFord, Illinois 61101 
USA (815) 968-2228 Phone 



(800) 373-7700 Orders (815) 968-6888 FAX 



Prima, AdiDE, AdRAM, Flicker Free Video, and Shuffle Board are trademark* of ICD, Inc. Olfler brand ana product names are registered trademarks or trademarks o! fheir reipecti^ holders.. 
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The fastest growing video technology company in the world is looking for programmers to 

join our team. We've assembled the hottest development group in the industry here at 

NewTek. But we have two slots open for a new project that will blow your socks' off. Have 

you always dreamed of working on revolutionary technology in a small, focused group? We 

need software innovators that want to create the products of tomorrow. Here are the skills 

you'll need: 

• Strong 68xxx assembly-language programming skills 

• At least 3 years of assembly-language programming experience 

• Ability to write low-level code for time-critical applications 

• Background in high-speed graphics and video applications 

■ Experience in programming prototype hardware 

• Ability to quickly learn new custom chip architectures 

• Background in low-level I/O and interrupt operations 

• Intimate understanding of Amiga O/S and hardware 

• Experience with video and graphics hardware 

• Ability to read a schematic 

• Strong organizational and project design skills 

• Being a self-motivated, self-teaching, innovator 

• An uncompromising drive for excellence 

If you've got what it takes, you'll be forging ahead where no programmer's ever gone before. 

NewTek offers outstanding (and unusual), compensation and benefit packages for the 

chosen few who are a cut above. You'll work in an environment created by hackers, designed 

to be a hackers heaven. Wouldn't it be fun to invent things that are featured in USA Today, 

Rolling Stone, and TIME? At NewTek your brain can change the world. 

Send Resumes to: 

Alcatraz 

CO NewTek 

215 SE 8th St. 

Topeka, KS 66603 
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